Unless you’ve put some serious effort into learning to master your audio, the concept of dither can be fairly bewildering. We’ve all been told to “do” it, but what are we actually doing?
If you’ve done any experimentation, it may be one area that hasn’t lead you to any firm conclusions, likely because it’s so difficult to hear, even under ideal circumstances.
So, in this article, I hope to uncover some of the mystery behind it, but present it practically so that you can make more informed decisions during the mastering process.
Fair warning, dither is a relatively technical topic so this will be a fairly technical article, but I guarantee that you will learn something. There will be lots of listening examples as well, so grab your nice pair of ‘phones if you want to get critical.
An example of the mastering process in terms of bit-depth and sample rate:
These are the sample rates and bit-depths I typically track/process with, but the changes from step to step are what you should take note of.
—Final Mix @ 24-bit 88.2 kHz
—Import to mastering DAW @ 24-bit 88.2 kHz (increasing the bit-depth to 32-bit will maintain better dynamic precision throughout processing, but I’ll use 24-bit for my examples)
—Process @ 24-bit 88.2 kHz
—Downsample to 24-bit 44.1 kHz
—Dither down to 16-bit 44.1 kHz
—Finished CD Quality Master@ 16-bit 44.1 kHz
Meat & Potatoes
To refresh, bit-depth can be considered synonymous with the amplitude/dynamic precision of a digital signal. In a 16-bit signal there are 65,536 (-32768 to +32767) discrete steps, or values, and in a 24-bit signal that number increases to over 16 million.
To put that in perspective, if each step in an unsigned (only positive values) 16-bit signal was a piece of paper, the stack would be almost 22 feet high. In a 24-bit signal, the stack would be over a mile high.
While that’s a good analogy for comparing the number of possible values, it’s important to remember that they both share the same ceiling at 0 dBFS.
Unfortunately, due to the limitations of the final medium (CD), we must convert the signal to a depth of 16 bits. To do so, the converter goes along, sample-by-sample, and chops off the last eight bits. Those are now lost and gone forever. This process inherently leaves distortion and other digital artifacts behind, because we’ve erased pertinent information from the signal.
In the mid to high amplitude portions of the signal, these errors don’t manifest as audibly as they do in the low-level portions, because they’re masked by actual audio. This is the reason you normally wouldn’t notice if a track did not use dither; the “bad stuff” is on the outskirts of the audio.
This is also the same reason that many engineers and audiophiles tend to perceive un-dithered audio as harsh, “lacking depth,” or cold — but definitely not unlistenable (unless the mix is just bad, but that’s another story).
Reverbs and room ambience are usually the quietest portions of a track, but are just as responsible for creating the space and “feel” of a track as the music itself.
If you’re a mix engineer, you know what it’s like to make pass after pass to ensure that the reverb on that vocal isn’t too loud or too soft, so why negatively impact your client’s work, or your own, in the mastering process?
Audio Examples — Before
Before we talk about what dither is and how it functions, take a listen to these examples:
Example 1.1: Original Source File @ 24-bit 44.1 kHz—Listen
Example 1.2: Source Truncated (no dither applied) to 16-bit 44.1 kHz—Listen
Example 1.3: In this example, I turned the source audio down -60 dB, truncated the audio, and normalized the result to 0 dBFS. Lowering the level before truncation allows the audio to react more radically than it would in a normal context and, essentially, turns the noise floor up to be equal with the audio.—Listen
This provides some perspective on what exists in the noise floor of un-dithered audio and what we’re trying to correct. Examples using this process will be followed by the word “Normalized.”
Anyone will tell you that last one sounds bad, or really cool if you’re like me, but here is why. When we reduce our bit depth to 16-bit, the smallest value we can encode was originally made up of 128 discrete values in 24-bit (and thousands in 32-bit).
Now any value smaller than what we call the Least Significant Bit (step +/-1 of 32767) has become 0. What we end up with is what you heard in the last example; i.e. distortion, harmonics, dropouts.
So, what are going to do to correct all of the distortion and digital artifacts? The answer is to add more noise. While it sounds counterintuitive, the goal is to prevent the distortion by summing the original signal with random digital noise (think pink/white noise) that will ultimately average to that of the original signal.
If we apply dither, as the samples are converted a random value will be added to the original sample at roughly -90 to -100 dBFS. By doing so, we are actually able to encode original values smaller than the LSB, because we’re adding a comfortable noise floor for the low-level audio to sit with, rather than letting the values drop to 0 just because the original signal is below the threshold of the LSB.
The random values are also different between left and right samples to maintain stereo separation.
If you’re a visual person, here’s an example from Bob Katz’s book “Mastering Audio: The Art and the Science.” This is for a mono-signal, but the same thing would happen in a stereo signal for both left and right.
—Upper 16 bits— —-Lower 8—-
Original 24-bit: MXXX XXXX XXXX XXXW YYYY YYYY
Add random number: ZZZZ ZZZZ
The bit, ‘W,’ becomes the LSB in a 16-bit signal and is the bit that gets toggled based on the resulting summation.
Now that we’ve talked about it, let’s listen to it. Exciting!