June 17, 2013, by SIeve Olszewski
While cleaning out some of my numerous files I came across an old article about the Grateful Dead.
No, not the one about them laying down a 24-track tape bias bed of ambient desert noise on the multi-track machine before actually recording music, but rather, the one about using a square-wave impulse generator and a triggered oscilloscope to physically “time align” a multi-stage loudspeaker stack.
Very hip at the time, especially being pre-mainstream of FFT, TEF, Smaart, digital delays, and other time domain tools. The idea was that if you could see the leading edge of a square wave impulse from one loudspeaker stage, say the mids, on the scope, mark it with a grease pencil on the scope face, and then measure and match the other stages to this time point by physically moving the loudspeakers, you would get more coherent timing out of the total system.
I did this to one of my earlier fixed installs that had a bunch of horns and bass bins in a complex array, and I was able to get the relative inter-box timing within 6 microseconds at crossover frequency. Not bad for using an oscilloscope and a homemade impulse generator. Why would anyone bother to explore fine details like this?
The answer is that it is a pursuit of audio reinforcement perfection - or at least improvement - and a caring about providing the best sonic result possible from a loudspeaker system.
People might actually listen to the system and appreciate the increase in clarity and the reduction of time wash and phase smear, even if they didn’t know what any of that was.
As audio engineers and music lovers who make their living or satisfy their passion from providing sound systems or mixing on sound systems, we take much time in vetting out the best reinforcement systems, and then we use extremely advanced diagnostic tools to dial in the system and venue.
And, loudspeaker manufacturers use many of these same tools to dial-in components, crossovers and systems.
Once again, this is done, at least ostensibly, in a pursuit of audio excellence. So, with all this investment in time and energy toward making playback and reinforcement systems sing to perfection, why do we accept some audio sources that do nothing but make all that hard alignment work moot?
We’re talking digitized and packetized audio media here. While digital audio can be quite pristine, especially at its initial high-bit rate analog to digital encoding resolutions, the final material that gets into the loudspeaker systems of most everyone except the studio and the mastering house is many times a worn and beat up shadow of the original work.
If you use, say, a 24-bit 88.2 kHz native digital audio encoder to encode your perfectly selected and balanced binaural analog recording into your Pro Tools suite and listen to the playback, you’re most likely to believe that the digital copy is extremely accurate.
The original encoded copy will sound good, perhaps even stunningly so to the average listener, but going beyond the initial high-quality encoding to digital, especially with a bent toward saving bandwidth and storage space, or providing a way to store and play the file in modem portable media players starts you on a path of potential destruction.
Any time you go from the analog to digital audio domains, the first thing you lose is the continuous variability of the natural analog waveform. Digital encoding slices the continuous wave into chunks of 1s and 0s—a lot of chunks - but chunks nonetheless (Figure 1).
Figure 1. (click to enlarge)
Can we hear those chunks? Depends on the encoding scheme, who you are and how capable one is of discerning what’s good or bad.
While highly arguable, there are people out there that say that as soon as you encode to digital at any bit rate, you’re toast. These are mostly 2-channel audiofile types with $100,000 worth of loudspeakers, tube amps, turntables and an attitude, but they have a point: when you start eliminating pieces of information and most/all encoding makes guesses at or ignores what should be included and what gets left out of the signal something gets left behind.
You start to lose highly accurate transient response, as the encoder doesn’t know if it’s sampling at tile top of a beat, or before, or after. You also start losing very delicate spatial clues that make the panorama three-dimensionaL The result is sound that at some point risks becoming flat and lifeless.
The Nyquist theory says that if we can sample the top of a waveform and the bottom of the same waveform, we can extrapolate the trajectory in between and fill in the missing parts.
This is one reason why standardized 44.1 kHz CD encoding was accepted, albeit not without a fight, as it yielded a top-end frequency response of over 20 kHz. Plenty of response to satisfy our typical human maximum of 20 kHz hearing, even if quantization was used to fill in the gaps, especially at high frequencies.
But consider this: a splash cymbal can create overtones in the 30 kHz range or higher. Can we hear that directly? Not unless you’re a dog or a freak of nature, but sound, especially musical sound, is highly predicated on the interdependence between frequencies, even ones that are “out of band” of normal hearing.
Intermodulation distortion is an example of this interdependence. So is spatiality. So is a good mix. What we call “CD quality” is generally perceived as the best mainstream result for portable digital music, trumped only by more esoteric disc formats like SACD, which by the way sound very good, even to trained ears. But who owns an SACD player?
Not many people. I bet more people have turntables hanging around than SACD players. CD quality is a bit like a one-eyed man being king in a world of blind people - it just seems to be as good as it gets because there isn’t anything better readily available to the masses. From CD quality, digital audio compression algorithms bring us to “near CD quality” and from there it just turns into sonic crap that makes downloading, storing and porting tunes between devices easier.
Compression certainly doesn’t improve audio quality - once content is gone, it’s gone.
Enter the MP3. MP3 is an audio compression standard that somehow got in the door like one of your kid’s bad friends. MP3’s real-world viability is predicated mostly on satisfying a way to get music across networks and onto storage devices while using a minimal amount of space.
But the MP3, even at its highest bit-rate is little more than a thief that stole your music quality. MP3 uses a “loss-y” compression scheme, meaning that bits of information are not just missing due to native encoding practices, but that the loss continues to build as you compress the material to the MP3 format.
This thievery in MP3’s is called “Perceptual Coding” meaning that frequencies beyond our “perceptual ability” are considered less important and can be removed. Would we remove a drum solo from a concert because we didn’t think there were drummers in the room that would notice? No way.
The problem cascades with all the various encoded audio formats that are available; go from CD to MP3 and you lose information and quality.
Take that MP3 and convert it to WMA and you lose even more. The more you change formats through transcoding, whether you’re going to greater or lesser bit-rates, you lose more quality.
Why then do we tolerate a music delivery system that promotes reduction of sound quality in recorded source material, especially when there are coding schemes that are “Iossless” like Free Lossless Audio Codec (FLAC) and Apple Lossless (AlAC)?
These lossless codec examples may not be the be-all of compression encoding, but allowing the herd mentality of a new generation of music lovers - who just want the tunes to download to their computer and onto their iPod before they finish their Rockstar energy drink and go on to playing World Of Warcraft—to drive the accepted paradigm of modem portable digital music quality is insane.
It doesn’t have to be this way and it shouldn’t.
The conversion of music to the bit stream has great empowerment, both appreciative and disruptive. The reach and ease of using the Internet to deliver music is seductive and panders to the power of the network and pop culture. Unfortunately, that pandering brings us squarely up against the current limitations of the modern communications medium, which places a priority on compactness, portability and speed over quality.
Bit-stream-based businesses and manufacturers also want to sell things; music players, music download services and Google-style ads to a young and digitally empowered generation. As storage becomes less expensive and network bandwidth and speed become greater, we can look to a future where we do not need to place limitations on audio quality for network or storage efficiency.
The key is to not lose sight of what quality audio really mean” and prevent a new generation of listeners from believing that the current music download and transport technologies and the limited result they deliver are the only way to go.
As live audio folks who listen to the real thing night after night, venue after venue, let’s teach our children and our understudies well the merits of good sound reproduction. If we don’t, then simple minds will get what is pushed upon them, good or bad.
The modern data world and the culture it engenders generally doesn’t understand or care that stored music must be an honest and accurate representation of the original performance.
Let’s all work to change that mindset.
Steve Olszewski is a musician, soundman, technologist, ex-road dog and black-belt martial artist.