Study Hall

Lost In Translation: Is New Sound Good Sound?

Stored music must be an honest and accurate representation

While cleaning out some of my numerous files I came across an old article about the Grateful Dead.

No, not the one about them laying down a 24-track tape bias bed of ambient desert noise on the multi-track machine before actually recording music, but rather, the one about using a square-wave impulse generator and a triggered oscilloscope to physically “time align” a multi-stage loudspeaker stack.

Very hip at the time, especially being pre-mainstream of FFT, TEF, Smaart, digital delays, and other time domain tools. The idea was that if you could see the leading edge of a square wave impulse from one loudspeaker stage, say the mids, on the scope, mark it with a grease pencil on the scope face, and then measure and match the other stages to this time point by physically moving the loudspeakers, you would get more coherent timing out of the total system.

I did this to one of my earlier fixed installs that had a bunch of horns and bass bins in a complex array, and I was able to get the relative inter-box timing within 6 microseconds at crossover frequency. Not bad for using an oscilloscope and a homemade impulse generator. Why would anyone bother to explore fine details like this?

The answer is that it is a pursuit of audio reinforcement perfection – or at least improvement – and a caring about providing the best sonic result possible from a loudspeaker system.

People might actually listen to the system and appreciate the increase in clarity and the reduction of time wash and phase smear, even if they didn’t know what any of that was.

As audio engineers and music lovers who make their living or satisfy their passion from providing sound systems or mixing on sound systems, we take much time in vetting out the best reinforcement systems, and then we use extremely advanced diagnostic tools to dial in the system and venue.

And, loudspeaker manufacturers use many of these same tools to dial-in components, crossovers and systems.

Once again, this is done, at least ostensibly, in a pursuit of audio excellence. So, with all this investment in time and energy toward making playback and reinforcement systems sing to perfection, why do we accept some audio sources that do nothing but make all that hard alignment work moot?

We’re talking digitized and packetized audio media here. While digital audio can be quite pristine, especially at its initial high-bit rate analog to digital encoding resolutions, the final material that gets into the loudspeaker systems of most everyone except the studio and the mastering house is many times a worn and beat up shadow of the original work.

If you use, say, a 24-bit 88.2 kHz native digital audio encoder to encode your perfectly selected and balanced binaural analog recording into your Pro Tools suite and listen to the playback, you’re most likely to believe that the digital copy is extremely accurate.

The original encoded copy will sound good, perhaps even stunningly so to the average listener, but going beyond the initial high-quality encoding to digital, especially with a bent toward saving bandwidth and storage space, or providing a way to store and play the file in modem portable media players starts you on a path of potential destruction.

Any time you go from the analog to digital audio domains, the first thing you lose is the continuous variability of the natural analog waveform. Digital encoding slices the continuous wave into chunks of 1s and 0s—a lot of chunks – but chunks nonetheless (Figure 1).

Figure 1. (click to enlarge)

Can we hear those chunks? Depends on the encoding scheme, who you are and how capable one is of discerning what’s good or bad.

While highly arguable, there are people out there that say that as soon as you encode to digital at any bit rate, you’re toast. These are mostly 2-channel audiofile types with $100,000 worth of loudspeakers, tube amps, turntables and an attitude, but they have a point: when you start eliminating pieces of information and most/all encoding makes guesses at or ignores what should be included and what gets left out of the signal something gets left behind.

You start to lose highly accurate transient response, as the encoder doesn’t know if it’s sampling at tile top of a beat, or before, or after. You also start losing very delicate spatial clues that make the panorama three-dimensionaL The result is sound that at some point risks becoming flat and lifeless.

Study Hall Top Stories