I ran into this problem with a codec chip. We had to run the output at 48 khz, and the input and output clock had to be the same. Well we didn't have enough CPU to process the input at 48 khz, we only cared about 8 khz bandwidth for human speech. Boxcar averaging then decimating produced way too much aliasing, so much that the ML classifiers wouldn't work. The solution was putting a honking large FIR anti aliasing filter on the codec, like 100 taps, because -that- chip had oodles of CPU to spare as it turned out.
Why FIR? It had a super sharp cutoff and of course linear phase. If I'd known about Bessel filters at the time I'd have tried those out. Live and learn and ship it.
To clarify - you were able to run the FIR filter on the codec chip because the main CPU didn't have enough <somthing - clock rate?> to handle the 48kHz input?
Nifty story, just wanted to validate my understanding.
I could have done that and moved the cutoff frequency back to, oh let's say 4khz but then I'm adding attenuation at the higher end of the band and nonlinear group delay. That's maybe fine if that's how we processed the training data, but I don't want us to retrain the classifier every time I change the anti aliasing filter.
Many brilliant points. Particularly about Bessel filters instead of Butterworth. And also about lowering Fc and increasing Fs, rather than increasing order. And also about considering time domain as well as frequency domain.
The dense interview question I use to assess this area of knowledge: "How do you choose the stopband attenuation of a filter?" You can assess a lot from the interviewee's response. Stopband attenuation, with respective to the input signal magnitude at the stopband frequency, or start of the stopband, is the most relevant term for determining the magnitude of noise in the sampled signal. And that is the upper bound on the performance of the downstream algorithm.
My response would be: "to validate the chosen stopband attenuation, compare the unwanted signal or interference that passes through the filter with the noise that you already have or with other factors that would make this unwanted signal or interference unnoticeable".
That's a good way of looking at it. The "unnoticeable" part is where more can be teased out:
That is, to expect a specification for the desired S/N (signal-to-noise) ratio and bandwidth, and/or bit rate at which the downstream algorithm is expected to work. This doesn't have to be a perfect estimate/evaluation/measurement, but it helps to get the ballpark amount of computation power, delay, etc. needed to actually make the system work. All too often I see people react to a noisy signal, by "putting a filter on it," without knowing how much attenuation/gain/phase shift, etc. are actually needed to make the system work acceptably.
E.g. You don't want to spend too long designing the perfect filter... And you want to know if there's any hope at all cleaning up the signal to make use of it!
The biggest one I have run into is that the bandwith doesn't need to be contiguous. If you know the the important bits of the signal are contained in certain frequency bands then you can get much lower sampling rates. Basically why things like l1 reconstruction work.
Bandpass sampling is a regular technique used in the RF world to mix down a signal to its final IF or baseband frequency, digitally. In this case, you are undersampling but using a well defined signal that you have apriori knowledge of where it is before and after sampling.
This is the most lucid explanation of aliasing I think I've ever read.
I mostly deal with this stuff in the realm of audio where the matter of analog vs digital rages regularly. The reason one hears "Nyquist says" so often from people defending digital audio is that the pro-analog people are imagining stair-stepped signals in which the information that is "lost" results in a noticeable degradation in sound quality. This (11-year-old) video is the gold standard for addressing this concern:
That depends on the profession I think. If you are designing A/D or D/A converters, you know sampling and all the artefacts by heart. Nyquist criterion is extremely important, but it is simplistic. You need to consider a lot of other stuff like out-of-band noise and harmonics, reconstruction filters etc etc.
We have periodic steady state analysis to simulate sampled systems and harmonic transfer functions by the way. It is extremely useful especially to stimulate noise aliasing.
This could be a useful disclaimer, to say "we are talking about recording and processing and NOT about playback".
Audiophiles demand streaming services to provide 192/24 because they see the music being originally mastered at high sample rates, and from that conclude that listening to 48/16 is a loss of "original recorded quality".
I can totally envision audiophiles picking up the Wescott's article and using it as "scientific" argument to distribute more Hi-Res music, and some half-technical manager buying that. They won't even read it.
The noise floor of 16-bit music is imperceptible so for mastered music at least 24-bits is useless. Both a higher sample rate and bit depth can be useful for recording purposes - provided the room is perfectly treated and there are no external sources of noise.
We know that 2^10 ≈ 10^3, so 2^24 ≈ 10^(3/10*24) = 10^7.2. This 7.2 is about amplitude, but what we hear is power which is quadratic in amplitude, so take a factor of 2, and you're talking 14.4 bel, or 144 decibel for 24 bit (and 2/3 of that, 96 db, for 16 bit).
If you're thinking "The highest rate I need my signal to be able to replicate is X, so I should set my sampling rate to 2X," then you're wrong and this article gives several reasons why.
As far as I can tell, though, it doesn't mention what may be the most important reason (especially to the folks here at hackernews): resampling and processing.
This is why professional grade audio processing operates at a sample rate many multiples higher than human hearing. It's not because of the quality difference between, say, 192 and 96 kHz, but rather if you're resampling or iterating a process dozens of times at those rates, eventually artifacts will form and make their way into the range of human hearing (20 kHz).
You’re right, but I fear this idea has become prevalent in audiophile communities where they only want to listen to files that are 96kHz or higher.
In my opinion, having a high sample rate only really matters during the production phase and does not have a noticeable effect on the final form factor. If the producer uses high sample rate during the creation process, I see no reason why the listener would care if the file they’re listening to is higher than even 44.1kHz unless they are planning on using it for their own production.
People should prefer 48k over 44.1 but not for fidelity. It would just make the world a better place if 44.1k audio files died out. The reasons it was chosen are invalid today and we're stuck with it, and now every audio stack needs to be able to convert between 44.1/88.2 and 48/96 which is a solved problem, but has a tradeoff between fidelity and performance that makes resampling algorithms a critical design feature of those stacks.
All because Sony and Philips wanted 80 minutes of stereo audio on CDs decades ago.
It's very likely that the 44.1 kHz rate comes from the PCM adaptors that were designed to take PCM audio and convert it to something that a video tape recorder would accept.
I watched a YouTube a few months ago about these adaptors and the presenter did the calculations showing how the 44.1 kHz 16-bit sample rate lines up with the video fields. There was a valid engineering reason for this sampling rate.
However, the stories about one of the Sony executives having a particular piece of music in mind are true, and have to do with the diameter of the disk being enlarged compared to what Philips originally had in mind. By that time the bitrate was already decided.
I still agree that 48 kHz is a better choice today, especially after reading this paper.
> Kees Immink, Philips' chief engineer, who developed the CD, recalls that a commercial tug-of-war between the development partners, Sony and Philips, led to a settlement in a neutral 12-cm diameter format. The 1951 performance of the Ninth Symphony conducted by Furtwängler was brought forward as the perfect excuse for the change,[76][77] and was put forth in a Philips news release celebrating the 25th anniversary of the Compact Disc as the reason for the 74-minute length.
While audio equipment and algorithms don't care about nice-looking numbers, I think the actually useful property is that 48000 has more favorable prime factors 44100 which can be a useful property for resampling and other tasks.
The same could be said about bit depth: 24 bits offers far less quantization artifacts than 16 bits, and those artifacts can readily show up during production processes such as dynamic range compression, but they are extremely well hidden by dithering with noise shaping which gets applied during mastering so ultimately listeners are fine either way.
However, any type of subsequent processing in the digital domain, even just a volume change by the listener if it's applied digitally in the 16 bit realm (i.e., without first upscaling to 24 bits), completely destroys the benefit of dithering. For that reason, we might say that additional processing isn't confined to the recording studio and can happen at the end user level.
I'm unsure whether this same logic applies to sampling frequency, but probably? I guess post-mastering processing of amplitude is far more common than time-based changes, but maybe DJs doing beat matching?
The real benefit is not using 6x network bandwidth, storage, memory, processing power and more battery of the mobile device. That benefit is not going anywhere, no matter what.
Post-processing is applied to the signal which is physically impossible to distinguish from the source. It is true that it often needs higher resolution, and DSPs will upsample internally and then back and operate on floats. But to claim without evidence, that post-processing may give human listener back the ability to tell apart whether 192/24 medium was used instead of 48/16, would be to reintroduce the same quality-loss paranoia, just with an extra step. If one couldn't hear the difference before an effect was applied...they won't hear it after.
As for DJs, they do use high-res assets when producing mixes. That's still mastering stage, technically.
With music, in particular, if you use any analog sources while recording, the signal will contain so much noise that any dithering signal will be far below the floor and will most likely be completely redundant. I know that people claim to hear a difference, but they also claim to hear a difference between gold and copper contacts.
I hear no difference between undithered 16 bit and anything "better" (e.g. dithered 16 bit, or more bits) and anyone who claims they do should be highly scrutinized, when we're talking about a system (media, DAC, amplification, transducer, human) playing a mastered recording at a moderate volume setting. But I certainly hear the difference (as quantization artifacts) when cranking the volume up to extremely high levels when the source material is extremely quiet, like during a fade out, a reverb tail, or just something not properly mastered to use the full range; setting the volume to something that would totally clip the amp, blow the speakers, or deafen me if it weren't a very quiet part of the recording.
Dithering (or more bits) does solve for this. A fade out of the song also lowers the captured noise floor, but the dither function keeps going.
It's akin to noticing occasional posterization (banding) in very dark scenes if your TV isn't totally crushing the blacks. With a higher than recommended black level, you will see this artifact, because perceptual video codecs destroy (for efficiency purposes) the visual dither that would otherwise soften the bands of dark color into a nice grainy halftone sort of thing which would be much less offensive.
Maybe it's an uncommon scenario these days, but not too terribly long ago I think it was fairly typical for software audio players to be 16-bit and offer volume controls, and anything other than 100% would completely ruin most benefits of dither.
Not just eventually: many effects, such as basically any non-linear mapping like a distortion, will create overtones that will immediately alias down if you are not oversampling. You either need to use some DSP tricks or oversample (usually a mix of both) to avoid this happening, which often happens in just one step of an effects chain.
Even the term "oversampling" implies that sampling beyond Nyquist rate is excessive. I think you would agree that one is not being excessive. It is necessary to sample well beyond accepted "Nyquist rate" in order to reconstruct the signal.
Your signal contains all kinds of frequencies: Those you care about and those you don't want in your recording. You can't just sample at the Nyquist rate of the interesting frequency and expect all the other frequencies to vanish. They will mess with the frequencies you are actually interested in.
That is the term, however. You see it in many contexts where a higher sample rate is traded for some other desirable attribute. (For example, it's often desirable for an ADC to sample faster than the higher frequency content you care about in an analog signal, for the reasons detailed in the paper as well as because it can give you a lower noise ADC. delta-sigma converters being an extreme case of this, helped by a seperate trick of noise shaping).
It's worth noting it's a tradeoff, even in pure processing: almost all non-linear transfer functions will create an infinite number of overtones, so it's impossible to avoid aliasing completely: you can only reduce them to some threshold which is acceptable to the application.
I think you're mixing up the effects of _sample rate_ and _bit depth_ here!
Everything you said about sample rate applies more to bit depth. Higher bit depth (bits per sample) results in a lower noise floor. When audio is digitally processed or resampled, a small amount of noise ("quantization distortion") is added, which accumulates with further processing. This can be mitigated by working at higher bit depths - which is why professional grade audio processing routinely uses 24 bit formats (for storage) and 32-bit or 64-bit floating point internally (for processing), even if the final delivery format is only 16 bit.
Sample rate, on the other hand, affects bandwidth. A higher sample rate recording will contain higher frequencies. It doesn't have any direct effect on the noise floor or level of distortion introduced by resampling, as I understand. (It could have an indirect effect - for example, if certain hardware or plugins work better at particular sample rates.)
A survey of ~2,000 professional audio engineers done in May 2023 showed that 75% of those working in music use 41.1 kHz or 48 kHz, while 93% of those working in post production use 41.1 kHz or 48 kHz.[1] These are the basic CD-derived and video-derived sample rate standards.
From this it's clear that even in professional audio, higher sample rates are a minority pursuit. Furthermore, the differences are extremely subjective. Some audio engineers swear by higher sample rates, while others say it's a waste of time unless you're recording for bats. It's very rare (and practically, quite difficult) to do proper tests to eliminate confirmation bias.
Also, when the sampling rates get extreme (software-defined radio), it is well worth moving to complex samples. Doing so allows you to use a sampling rate equal to your theoretical maximum bandwidth, instead of 2x. That's not such a big deal at audio bandwidth, but when your Airspy is slinging a 6 MHz chunk of spectrum, it becomes a huge deal.
Another factor which I don't see mentioned is that the tech audio signal is not always directly going to your ear. Is it possible for the sound bouncing around the room to break such assumptions?
This is all about representing signals inside a computer. Audio played from a speaker (or as it exists in the physical domain) is continuous and your ear doesn't have a sample rate. So there's no concept of a Nyquist limit or aliasing with physical sound.
Higher sampling rate makes it easier to identify non-sound disturbances. Like vibrations or electrical, that can show up in multiple orders of some frequency.
I have referred so many people to this paper over the years. It does a great job of dispelling a lot of misunderstandings people have about Nyquist's theorem.
For anyone interested in an interactive tool for playing with the concepts noted, here’s something I put together a while back for demonstrating to colleagues: https://www.desmos.com/calculator/pma1jpcuv0.
So, we first need to find the highest motion frequency a sensor might experience in the experiment, and then make sure the sampling rate it at least twice that to avoid aliasing?
Meaning, if an IMU sensor is mounted on a very slow moving, such as 2cm/s, RC vehicle then the sampling rate can be very slow. But if the sensor is on a fast moving drone, we need to estimate the highest frequency of the motion and make sure our sampling rate is at least double that?
Yes, pretty much, but it depends. Your 2 cm/s RC vehicle might still experience high-frequency vibrations (in the range of the motor rpm) and you might want to sample that.
Whether it is signal of interest or not, if above the sampling rate it will be aliased down into lower frequencies. So all energy above the sampling rate should be removed to avoid this disturbance.
So low pass filter is always mandatory? For example, in this RC car case - lets say we are sampling at 1Hz and the signal appears to be clean. But we still have to run a low pass filter of ensure the motor vibrations are removed?
It’s interesting to me how frequently people will talk about sampling performance profilers and not know about Nyquist.
Given a CPU runs at many GHz but SW sampling profilers run at ~1 or even maximum 10khz, it’s really hard to write software if you’re targeting processing at MHz rates.
I'm not sure what you mean here, but I would say that sampling profilers are a good example of Nyquist applied correctly for the most part: your signal (the distribution of where your software is spending its time) is likely to have an extremely low bandwidth (as in, it's basically static), so why try to sample super rapidly? It's much more of a question of whether you get enough samples to represent that distribution accurately, not how quickly you get those samples.
You’re talking about profiling a normal running program. If you’re profiling a benchmark that’s always executing the hot path, where exactly the hot path contribution lies becomes less clear. This is because you run into aliasing with the samples being collected at unhelpful points. Maybe if you run long enough you start to see a picture but at 1khz you’re going to have to run for a very long time. The other way aliasing comes into effect is that it becomes to see the impact of “individually cheap” but often executed pieces of code (eg a simple bounds check may not show up even if it’s responsible for a 20% slowdown because the probability of your sampler hitting it is small when that bounds check takes nanoseconds to execute vs your ms profiler sample rate.
Basically a 1ms sampler can pick out a signal that’s 2ms or longer in periodicity if sampled once (all faster signals will get aliased). To get to 1ghz (once a nanosecond) would require capturing 1 million times more samples and you’re still dealing with aliasing screwing up the picture you’re getting (although maybe with stack sampling you get disambiguation to combat aliasing? Not sure).
I don't think what you're talking about is aliasing: it's more to do with the statistics of sampling. Though even then I still don't quite get what you mean: if a bounds check is 20% of your runtime then you're going to see it in your samples pretty quickly. If it's a small enough fraction of your runtime that you don't expect to see it in millions of samples, then why is it relevant to your performance? Now, if you're worried about latency outliers, I can see why sampling may not be a useful tool, but again I don't think the reason for that is really aliasing.
If the 20% is a hotspot yes. If the 20% is because it’s been inlined and split across 100 different call sites each contributing 0.2%, I don’t think it’s so easy to spot.
> a simple bounds check may not show up even if it’s responsible for a 20% slowdown because the probability of your sampler hitting it is small when that bounds check takes nanoseconds to execute vs your ms profiler sample rate.
Surely for this to happen you'd have to be putting a lot of effort into getting a perfect 1ms sampling rate, and even a little bit of variation in that would be more than enough to handle aliasing issues.
It’s been a few years but if I recall correctly the fact that there’s variation in the sampling itself makes the aliasing worse not better. At the very least should be no different.
It sounds like one of your concerns is being catastrophically unlucky with the sampling rate:
> This is because you run into aliasing with the samples being collected at unhelpful points.
I interpret this as you saying "we sample at times T, 2T, ..., but there's a hotspot that hits at T+0.001, T+0.003, ..., T+0.999, 2T+0.001, ..., and we never get to visit it". I'll grant that this could happen, altho it seems contrived, but my claim is that by sampling at "T+/- epsilon, 2T +/- 2 epsilon, ...." sooner or later you're going to start hitting that hotspot. And before too long if, say, 5% of the time the CPU is executing that code you're going to hit it, on average, 5% of the time.
It'll be aliased, sure, but in a way that smears across all frequency bins instead of getting missed. You won't be able to recover the true frequency (at least not without fancy sparse methods) but why do you care? The important question is "where is the CPU spending its time" and not "is this function being entered 100000 times per second or 100001 times".
Here's another general objection. The things being sampled are square waves: a function is either in the call stack or it's not, the program counter is either at a particular location or it's not, and so on. That means you're going to have energy at all odd partials, which you'll have to account for somehow, but however you do it it's not going to reflect the underlying behavior.
It’s not about being unlucky, it’s the second thing you point out. It’s not the difference between 100khz and 100.001khz. You’re going to roughly alias at multiples of the sampling frequency, so you can’t tell between 2khz, 10khz, and 100khz which means that even if you collect enough signal, your 1Mhz signal with code running at 1ghz is going to potentially look like a 10khz signal leading you to make the wrong conclusions about where time is being spent.
Remember - profiling tools aren’t even giving you a frequency bin and indicating which samples you see there. It’s giving you a sample and estimating frequency. Most people are not optimizing code that’s running at millions of times per second so it’s not a common problem, but all sorts of wrong conclusions start to get made.
> so you can’t tell between 2khz, 10khz, and 100khz
Only when your sample rate is perfect!
Let's say we have a 1GHz system, one instruction per Hz, a signal at 1MHz and we're measuring at 1kHZ. If we're rock-solid at 1kHz then, as you said, we can't distinguish 1MHz from 2kHz, but if we're off by a little bit, then things change.
Let's say the 1MHz events are at times 50, 1050, 2050, and so on, and the 1kHz sample rate triggers at 0, 1000000, 2000000, and so on. You'll obviously never see that event.
But suppose there's a little bit of noise in the sample rate timing and we're just under 1kHz, so now the samples are 0, 1000001, 2000001, 3000005, etc. Sooner or later we're going to hit that 1MHz event, and we're not going to hit it at the same rates as we would a 2kHz event, a 10kHz event, and so on.
We might not hit it that often, but we'll hit it sooner or later, and we'll also hit a lot of its neighbors.
You’re not describing anything different from how Nyquist pops up in analog signals either - sampling and signal will always have jitter and noise. I would recommend reading the linked pdf if you haven’t done so as it gives a number of examples of how Nyquist can screw up your intuition.
I read it when it was first posted, and skimmed it again just now to be sure, and my conclusion is unchanged: Nyquist has nothing useful to say about sampling profilers. I think the author would agree with me. See, for example, the discussion of EKG signals, or this bit from the intro:
> The difficulty with the Nyquist-Shannon sampling theorem is that it is based on the notion that the signal to be sampled must be perfectly band limited. This property of the theorem is unfortunate because no real world signal is truly and perfectly band limited.
whose relevance I'll get to below.
(I'm assuming that by "sampling profiler" you mean the usual practice of recording on a cadence the call stack / program counter / etc of some code; if this is not what you meant please clarify)
If you approach sampling profiling from a signals viewpoint, what's the signal that you're sampling? I see it as a collection of pulse waves of varying (and irregular!) duty cycles, each pulse wave corresponding to a yes/no answer to "is the program inside this function / executing this instruction / etc". At any given sample we collect our data and that tells us which waves are high; all others will be low.
Nyquist, as the above quote points out, only applies to perfectly band limited signals. Not only are pulse waves not perfectly band limited, they're actually extremely not band limited, with energy at harmonics going all the way up. Right away that should tell you that Nyquist is the wrong way to be thinking about things!
And furthermore, Nyquist tells you what you need if you want to reconstruct the original signal, but why do you want to do that in the first place? Do you actually care about the phase of the wave and the timings of the pulses, or do you just care about how often the wave is high? (i.e. how often any bit of code is executed). I don't think I've ever cared about the phase and timings when profiling, but I do care very much about how often the wave is high.
One possible issue is that if sampling is even slightly biased, it can incorrectly estimate the relative frequency of different points/functions in tight inner function calls (which can't happen with infrequently called functions with a long runtime)?
Why FIR? It had a super sharp cutoff and of course linear phase. If I'd known about Bessel filters at the time I'd have tried those out. Live and learn and ship it.