It’s interesting to me how frequently people will talk about sampling performanc...

rcxdude · on March 4, 2024

I'm not sure what you mean here, but I would say that sampling profilers are a good example of Nyquist applied correctly for the most part: your signal (the distribution of where your software is spending its time) is likely to have an extremely low bandwidth (as in, it's basically static), so why try to sample super rapidly? It's much more of a question of whether you get enough samples to represent that distribution accurately, not how quickly you get those samples.

anonymous-panda · on March 4, 2024

You’re talking about profiling a normal running program. If you’re profiling a benchmark that’s always executing the hot path, where exactly the hot path contribution lies becomes less clear. This is because you run into aliasing with the samples being collected at unhelpful points. Maybe if you run long enough you start to see a picture but at 1khz you’re going to have to run for a very long time. The other way aliasing comes into effect is that it becomes to see the impact of “individually cheap” but often executed pieces of code (eg a simple bounds check may not show up even if it’s responsible for a 20% slowdown because the probability of your sampler hitting it is small when that bounds check takes nanoseconds to execute vs your ms profiler sample rate.

Basically a 1ms sampler can pick out a signal that’s 2ms or longer in periodicity if sampled once (all faster signals will get aliased). To get to 1ghz (once a nanosecond) would require capturing 1 million times more samples and you’re still dealing with aliasing screwing up the picture you’re getting (although maybe with stack sampling you get disambiguation to combat aliasing? Not sure).

rcxdude · on March 4, 2024

I don't think what you're talking about is aliasing: it's more to do with the statistics of sampling. Though even then I still don't quite get what you mean: if a bounds check is 20% of your runtime then you're going to see it in your samples pretty quickly. If it's a small enough fraction of your runtime that you don't expect to see it in millions of samples, then why is it relevant to your performance? Now, if you're worried about latency outliers, I can see why sampling may not be a useful tool, but again I don't think the reason for that is really aliasing.

anonymous-panda · on March 4, 2024

If the 20% is a hotspot yes. If the 20% is because it’s been inlined and split across 100 different call sites each contributing 0.2%, I don’t think it’s so easy to spot.

rcxdude · on March 4, 2024

Sure, but for basically the same reason that it's difficult to spot in a trace: you need to classify all those segments of code as the same thing.

bigbillheck · on March 4, 2024

> a simple bounds check may not show up even if it’s responsible for a 20% slowdown because the probability of your sampler hitting it is small when that bounds check takes nanoseconds to execute vs your ms profiler sample rate.

Surely for this to happen you'd have to be putting a lot of effort into getting a perfect 1ms sampling rate, and even a little bit of variation in that would be more than enough to handle aliasing issues.

anonymous-panda · on March 4, 2024

It’s been a few years but if I recall correctly the fact that there’s variation in the sampling itself makes the aliasing worse not better. At the very least should be no different.

bigbillheck · on March 4, 2024

I wonder if we're talking about different things.

It sounds like one of your concerns is being catastrophically unlucky with the sampling rate: > This is because you run into aliasing with the samples being collected at unhelpful points.

I interpret this as you saying "we sample at times T, 2T, ..., but there's a hotspot that hits at T+0.001, T+0.003, ..., T+0.999, 2T+0.001, ..., and we never get to visit it". I'll grant that this could happen, altho it seems contrived, but my claim is that by sampling at "T+/- epsilon, 2T +/- 2 epsilon, ...." sooner or later you're going to start hitting that hotspot. And before too long if, say, 5% of the time the CPU is executing that code you're going to hit it, on average, 5% of the time.

It'll be aliased, sure, but in a way that smears across all frequency bins instead of getting missed. You won't be able to recover the true frequency (at least not without fancy sparse methods) but why do you care? The important question is "where is the CPU spending its time" and not "is this function being entered 100000 times per second or 100001 times".

Here's another general objection. The things being sampled are square waves: a function is either in the call stack or it's not, the program counter is either at a particular location or it's not, and so on. That means you're going to have energy at all odd partials, which you'll have to account for somehow, but however you do it it's not going to reflect the underlying behavior.

anonymous-panda · on March 4, 2024

It’s not about being unlucky, it’s the second thing you point out. It’s not the difference between 100khz and 100.001khz. You’re going to roughly alias at multiples of the sampling frequency, so you can’t tell between 2khz, 10khz, and 100khz which means that even if you collect enough signal, your 1Mhz signal with code running at 1ghz is going to potentially look like a 10khz signal leading you to make the wrong conclusions about where time is being spent.

Remember - profiling tools aren’t even giving you a frequency bin and indicating which samples you see there. It’s giving you a sample and estimating frequency. Most people are not optimizing code that’s running at millions of times per second so it’s not a common problem, but all sorts of wrong conclusions start to get made.

bigbillheck · on March 4, 2024

> so you can’t tell between 2khz, 10khz, and 100khz

Only when your sample rate is perfect!

Let's say we have a 1GHz system, one instruction per Hz, a signal at 1MHz and we're measuring at 1kHZ. If we're rock-solid at 1kHz then, as you said, we can't distinguish 1MHz from 2kHz, but if we're off by a little bit, then things change.

Let's say the 1MHz events are at times 50, 1050, 2050, and so on, and the 1kHz sample rate triggers at 0, 1000000, 2000000, and so on. You'll obviously never see that event.

But suppose there's a little bit of noise in the sample rate timing and we're just under 1kHz, so now the samples are 0, 1000001, 2000001, 3000005, etc. Sooner or later we're going to hit that 1MHz event, and we're not going to hit it at the same rates as we would a 2kHz event, a 10kHz event, and so on.

We might not hit it that often, but we'll hit it sooner or later, and we'll also hit a lot of its neighbors.

anonymous-panda · on March 5, 2024

You’re not describing anything different from how Nyquist pops up in analog signals either - sampling and signal will always have jitter and noise. I would recommend reading the linked pdf if you haven’t done so as it gives a number of examples of how Nyquist can screw up your intuition.

bigbillheck · on March 5, 2024

I read it when it was first posted, and skimmed it again just now to be sure, and my conclusion is unchanged: Nyquist has nothing useful to say about sampling profilers. I think the author would agree with me. See, for example, the discussion of EKG signals, or this bit from the intro:

> The difficulty with the Nyquist-Shannon sampling theorem is that it is based on the notion that the signal to be sampled must be perfectly band limited. This property of the theorem is unfortunate because no real world signal is truly and perfectly band limited.

whose relevance I'll get to below.

(I'm assuming that by "sampling profiler" you mean the usual practice of recording on a cadence the call stack / program counter / etc of some code; if this is not what you meant please clarify)

If you approach sampling profiling from a signals viewpoint, what's the signal that you're sampling? I see it as a collection of pulse waves of varying (and irregular!) duty cycles, each pulse wave corresponding to a yes/no answer to "is the program inside this function / executing this instruction / etc". At any given sample we collect our data and that tells us which waves are high; all others will be low.

Nyquist, as the above quote points out, only applies to perfectly band limited signals. Not only are pulse waves not perfectly band limited, they're actually extremely not band limited, with energy at harmonics going all the way up. Right away that should tell you that Nyquist is the wrong way to be thinking about things!

And furthermore, Nyquist tells you what you need if you want to reconstruct the original signal, but why do you want to do that in the first place? Do you actually care about the phase of the wave and the timings of the pulses, or do you just care about how often the wave is high? (i.e. how often any bit of code is executed). I don't think I've ever cared about the phase and timings when profiling, but I do care very much about how often the wave is high.

nyanpasu64 · on March 4, 2024

One possible issue is that if sampling is even slightly biased, it can incorrectly estimate the relative frequency of different points/functions in tight inner function calls (which can't happen with infrequently called functions with a long runtime)?

gct · on March 4, 2024

Fun fact static is very broadband