|
Among the many definitions for the wonderful word "dharma"
is the essential function or nature of a thing. That is what this
note is about: the essential function or nature of audio analog-to-digital
(A/D) converters.
Like everything else in the world, the audio industry has been radically
and irrevocability changed by the digital revolution. No one has
been spared. Arguments will ensue forever about whether the true
nature of the real world is analog or digital; whether the fundamental
essence, or dharma, of life is continuous (analog) or exists
in tiny little chunks (digital).
Seek not that answer here. Here we shall but resolve to understand
the dharma of audio A/D converters.
Data Conversion
It is important at the onset of exploring digital audio to understand
that once a waveform has been converted into digital format, nothing
can inadvertently occur to change its sonic properties.
While it remains in the digital domain, it is only a series of digital
words, representing numbers. Aside from the gross example of having
the digital processing actually fail and cause a word to be lost
or corrupted into none use, nothing can change the sound of the
word. It is just a bunch of "ones" and "zeroes."
There are no "one-halves" or "three-quarters".
The point being that sonically, it begins and ends with the
conversion process. Nothing is more important to digital audio than
data conversion. Everything in-between is just arithmetic and waiting.
That's why there is such a big to-do with data conversion. It really
is that important. Everything else quite literally is just details.
We could go so far as to say that data conversion is the art of
digital audio while everything else is the science, in that it is
data conversion that ultimately determines whether or not the original
sound is preserved (and this comment certainly does not negate the
enormous and exacting science involved in truly excellent data conversion.)
Since analog signals continuously vary between an infinite number
of states and computers can only handle two, the signals must be
converted into binary digital words before the computer can
work. Each digital word represents the value of the signal at one
precise point in time. Today's common word length is 16-bits or
32-bits. Once converted into digital words, the information may
be stored, transmitted, or operated upon within the computer.
In order to properly explore the critical interface between the
analog and digital worlds, it is necessary to review a few fundamentals
and a little history.
Binary Numbers
Whenever we speak of "digital," by inference, we speak
of computers (throughout this paper the term "computer"
is used to represent any digital-based piece of audio equipment).
And computers in their heart of hearts are really quite simple.
They only can understand the most basic form of communication or
information: yes/no, on/off, open/closed, here/gone - all of which
can be symbolically represented by two things - any two things.
Two letters, two numbers, two colors, two tones, two temperatures,
two charges - it doesn't matter. Unless you have to build something
that will recognize these two states - now it matters.
So, to keep it simple we choose two numbers: one and zero ... a
"1" and a "0." Officially this is known as binary
representation, from Latin bini two by two. In mathematics
this is a base-2 number system, as opposed to our decimal (from
Latin decima a tenth part or tithe) number system, which
is called base-10 because we use the ten numbers 0-9.
In binary we use only the numbers 0 and 1. "0" is a good
symbol for no, off, closed, gone, etc., and "1" is easy
to understand as meaning yes, on, open, here, etc. In electronics
it is easy to determine whether a circuit is open or closed, conducting
or not conducting, has voltage or doesn't have voltage.
Thus the binary number system found use in the very first computer,
and nothing has changed today. Computers just got faster and smaller
and cheaper, with memory size becoming incomprehensibly large in
an incomprehensibly small space.
One problem with using binary numbers is they become big and unwieldy
in a hurry. For instance, it takes six digits to express my age
in binary, but only two in decimal. But, in binary, we better not
call them "digits" since "digits" implies a
human finger or toe, of which there are ten, so confusion reigns.
To get around that problem John Tukey of Bell Laboratories dubbed
the basic unit of information (as defined by Shannon - more on him
later) a binary unit, or "binary digit" which became
abbreviated to "bit." A bit is the simplest possible message
representing one of two states.
So, I'm 6-bits old ... well, not quite. But it takes 6-bits to express
my age as 110111. Let's see how that works. I'm fifty-five years
old. So in base-10 symbols that is "55," which stands
for 5-1s plus 5-10s. You may not have ever thought about it, but
each digit in our everyday numbers represents an additional power
of 10 beginning with 0.
| 
Figure 1. Number Representation Systems
|
 |
That is, the first digit represents
the number of 1s (100), the second digit represents the number
of 10s (101), the third digit represents the number of 100s
(102), and so on. We can represent any size number by using
this shorthand notation.
Binary number representation is just the same except substituting
the powers of 2 for the powers of 10 [any base number system
is represented in this manner]. Therefore (moving from right
to left) each succeeding bit represents 20 = 1, 21 =2, 22
=4, 23 =8, 24 = 16, 25 =32, etc.
Thus, my age breaks down as 1-1, 1-2, 1-4, 0-8, 1-16, and
1-32, represented as "110111," which is 32+16+0+4+2+1
= 55 ... or double-nickel to you cool cats. Fig. 1
shows the two examples.
Now let's take a brief look at how all this came about. |
The Story of Harry & Claude
The French mathematician Fourier unknowingly laid the groundwork
for A/D conversion in the late 18th century. All data conversion
techniques rely on looking at, or sampling, the input signal
at regular intervals and creating a digital word that represents
the value of the analog signal at that precise moment. The fact
that we know this works lies with Nyquist.
Harry Nyquist discovered while working at Bell Laboratories in the
late `20s and wrote a landmark paper [1] describing the criteria
for what we know today as sampled data systems. Nyquist taught us
that for periodic functions, if you sampled at a rate that was at
least twice as fast as the signal of interest, then no information
(data) would be lost upon reconstruction.
And since Fourier had already shown that all alternating signals
are made up of nothing more than a sum of harmonically related sine
and cosine waves, then audio signals are periodic functions
and can be sampled without lost of information following Nyquist's
instructions.
This became known as the Nyquist frequency, which is the
highest frequency that may be accurately sampled, and is one-half
of the sampling frequency. For example, the theoretical Nyquist
frequency for the audio CD (compact disc) system is 22.05 kHz, equaling
one-half of the standardized sampling frequency of 44.1 kHz.
As powerful as Nyquist's discoveries were, they were not without
their dark side: the biggest being aliasing frequencies.
Following the Nyquist criteria (as it is now called) guarantees
that no information will be lost; it does not, however, guarantee
that no information will be gained.
Although by no means obvious, the act of sampling an analog signal
at precise time intervals is an act of multiplying the input
signal by the sampling pulses. This introduces the possibility of
generating "false" signals indistinguishable from the
original. In other words, given a set of sampled values, we cannot
relate them specifically to one unique signal. As Fig. 2 shows,
the same set of samples could have resulted from any of the
three waveforms shown ... and from all possible sum and difference
frequencies between the sampling frequency and the one being sampled.
All such false waveforms that fit the sample data are called "aliases."
In audio, these frequencies show up mostly as intermodulation distortion
products, and they come from the random-like white noise, or any
sort of ultrasonic signal present in every electronic system. Solving
the problem of aliasing frequencies is what improved audio conversion
systems to today's level of sophistication. And it was Claude Shannon
who pointed the way.
| 
Figure 2. Aliasing Frequencies
|
Shannon is recognized as the father of information theory: while
a young engineer at Bell Laboratories in 1948, he defined an entirely
new field of science. Even before then his genius shined through
for, while still a 22-year-old student at MIT he showed in his master's
thesis how the algebra invented by the British mathematician George
Boole in the mid-1800s, could be applied to electronic circuits.
Since that time, Boolean algebra has been the rock of digital
logic and computer design. [2]
Shannon studied Nyquist's work closely and came up with a deceptively
simple addition. He observed (and proved) that if you restrict the
input signal's bandwidth to less than one-half the sampling frequency
then no errors due to aliasing are possible. So bandlimiting
your input to no more than one-half the sampling frequency guarantees
no aliasing. Cool ... only it's not possible.
In order to satisfy the Shannon limit (as it is called - Harry gets
a "criteria" and Claude gets a "limit") you
must have the proverbial brick-wall, i.e., infinite-slope filter.
Well, this isn't going to happen, not in this universe. You cannot
guarantee that there is absolutely no signal (or noise) greater
than the Nyquist frequency. Fortunately there is a way around this
problem. In fact, you go all the way around the problem and look
at it from another direction.
If you cannot restrict the input bandwidth so aliasing does not
occur, then solve the problem another way: Increase the sampling
frequency until the aliasing products that do occur, do so at ultrasonic
frequencies, and are effectively dealt with by a simple single-pole
filter. This is where the term "oversampling" comes in.
For full spectrum audio the minimum sampling frequency must be 40
kHz, giving you a useable theoretical bandwidth of 20 kHz -- the
limit of normal human hearing.
Sampling at anything significantly higher than 40 kHz is termed
oversampling. In just a few years time, we have seen the
audio industry go from the CD system standard of 44.1 kHz, and the
pro audio quasi-standard of 48 kHz, to 8-times and 16-times oversampling
frequencies of around 350 kHz and 700 kHz respectively. With sampling
frequencies this high, aliasing is no longer an issue.
Okay. So audio signals can be changed into digital words (digitized)
without loss of information, and with no aliasing effects, as long
as the sampling frequency is high enough. How is this done?
Quantization
Quantizing is the process of determining which of the possible values
(determined by the number of bits or voltage reference parts) is
the closest value to the current sample - i.e., you are assigning
a quantity to that sample.
Quantizing, by definition then, involves deciding between two values
and thus always introduces error. How big the error, or how accurate
the answer, depends on the number of bits. The more bits, the better
the answer. The converter has a reference voltage which is divided
up into 2n parts, where n is the number of bits. Each part represents
the same value. Since you cannot resolve anything smaller than this
value, there is error. There is always error in the conversion process.
This is the accuracy issue.
| 
Figure 3. 8-Bit Resolution
|
The number of bits determines the converter accuracy. For 8-bits,
there are 28 = 256 possible levels as shown in Fig. 3. Since the
signal swings positive and negative there are 128 levels
for each direction. Assuming a ±5 V reference [3], this makes
each division, or bit, equal to 39 mV (5/128 = .039).
Hence, an 8-bit system cannot resolve any change smaller than 39
mV. This means a worst-case accuracy error of 0.78 percent. Table
1 compares the accuracy improvement gained by 16-bit, 20-bit and
24-bit systems along with the reduction in error. (Note: this is
not the only way to use the reference voltage.
Many schemes exist for coding, but this one nicely illustrates the
principles involved.) Each step size (resulting from dividing the
reference into the number of equal parts dictated by the number
of bits) is equal and is called a quantizing step (also called
quantizing interval -- see Fig. 4).
Originally this step was termed the LSB (least significant
bit) since it equals the value of the smallest coded bit, however
it is an illogical choice for mathematical treatments and has since
be replaced by the more accurate term quantizing step.
| # Bits |
# Divisions |
Resolution/Div |
Max % Error |
Max PPM Error |
| 8 |
27=128 |
39 mV |
0.78 |
7812.00 |
| 16 |
215=32,768 |
153 µV |
0.003 |
30.50 |
| 20 |
219=524,288 |
9.5 µV |
0.00019 |
1.90 |
| 24 |
223=8,388,608 |
0.6 µV |
0.000012 |
0.12 |
Table 1. Quantization Steps For ±5 Volts
Reference
| 
Figure 4. Quantization -- 3-Bit, 5V Example
|
 |
The error due to the quantizing
process is called quantizing error (no definitional
stretch here). As shown earlier, each time a sample is taken
there is error. Here's the not obvious part: the quantizing
error can be thought of as an unwanted signal which the quantizing
process adds to the perfect original.
An example best illustrates this principle. Let the sampled
input value be some arbitrarily chosen value, say, 2 volts.
And let this be a 3-bit system with a 5 volt reference. The
3-bits divides the reference into 8 equal parts (23 = 8) of
0.625 V each, as shown in Fig. 4. For the 2 volt input example,
the converter must choose between either 1.875 volts or 2.50
volts, and since 2 volts is closer to 1.875 than 2.5, then
it is the best fit. |
This results in a quantizing error of -0.125 volts, i.e., the
quantized answer is too small by 0.125 volts. If the input signal
had been, say, 2.2 volts, then the quantized answer would have been
2.5 volts and the quantizing error would have been +0.3 volts, i.e.,
too big by 0.3 volts.
These alternating unwanted signals added by quantizing form a quantized
error waveform, that is a kind of additive broadband noise that
is generally uncorrelated with the signal and is called quantizing
noise. Since the quantizing error is essentially random (i.e.
uncorrelated with the input) it can be thought of like white
noise (noise with equal amounts of all frequencies). This is
not quite the same thing as thermal noise, but it is similar. The
energy of this added noise is equally spread over the band from
dc to one-half the sampling rate. This is a most important point
and will be returned to when we discuss delta-sigma converters and
their use of extreme oversampling.
Successive Approximation
Successive approximation is one of the earliest and most successful
analog-to-digital conversion techniques. Therefore, it is no surprise
it became the initial A/D workhorse of the digital audio revolution.
Successive approximation paved the way for the delta-sigma techniques
to follow.
The heart of any A/D circuit is a comparator. A comparator is an
electronic block whose output is determined by comparing the values
of its two inputs. If the positive input is larger than the negative
input then the output swings positive, and if the negative input
exceeds the positive input, the output swings negative.
Therefore if a reference voltage is connected to one input and an
unknown input signal is applied to the other input, you now have
a device that can compare and tell you which is larger.
Thus a comparator gives you a "high output" (which could
be defined to be a "1") when the input signal exceeds
the reference, or a "low output" (which could be defined
to be a "0") when it does not. A comparator is the key
ingredient in the successive approximation technique as shown in
Figures 5A & 5B.
| 
Figure 5A. Successive Approximation Example
|
The name successive approximation nicely sums up how the
data conversion is done. The circuit evaluates each sample and creates
a digital word representing the closest binary value. The process
takes the same number of steps as bits available, i.e., a 16-bit
system requires 16 steps for each sample. The analog sample
is successively compared to determine the digital code, beginning
with the determination of the biggest (most significant) bit of
the code.
| 
Figure 5B. Successive Approximation A/D Converter
|
 |
The description given in Daniel
Sheingold's Analog-Digital Conversion Handbook (see
References) offers the best analogy as to how successive approximation
works. The process is exactly analogous to a gold miner's
assay scale, or a chemical balance as seen in Figure 5A. This
type of scale comes with a set of graduated weights, each
one half the value of the preceding one, such as 1 gram, 1/2
gram, 1/4 gram, 1/8 gram, etc.
You compare the unknown sample against these known values
by first placing the heaviest weight on the scale. If it tips
the scale you remove it; if it does not you leave it and go
to the next smaller value. |
If that value tips the scale you remove it, if it does not you
leave it and go to the next lower value, and so on until you reach
the smallest weight that tips the scale. (When you get to the last
weight, if it does not tip the scale, then you put the next highest
weight back on, and that is your best answer.) The sum of all the
weights on the scale represents the closest value you can resolve.
In digital terms, we can analyze this example by saying that a "0"
was assigned to each weight removed, and a "1" to each
weight remaining -- in essence creating a digital word equivalent
to the unknown sample, with the number of bits equaling the number
of weights. And the quantizing error will be no more than 1/2 the
smallest weight (or 1/2 quantizing step).
As stated earlier the successive approximation technique must repeat
this cycle for each sample. Even with today's technology, this is
a very time consuming process and is still limited to relatively
slow sampling rates, but it did get us into the 16-bit, 44.1 kHz
digital audio world.
|