There are two ways to assess the quality of audio devices: measuring and listening.
Measuring is usually the better choice because the results are absolute, and repeatable because they avoid the vagaries of human hearing perception. But when measuring isn’t practical or possible, a listening test using a music source is perfectly fine.
For example, listening is needed to compare CD quality at a 44.1 KHz sample rate to “high-definition” audio at 96 KHz. Both will measure the same if the frequency response is limited to the audible range, but some people believe they sound different.
Another example is comparing MP3 bit-rates, especially higher values such as 256 versus 320 kbps. It’s pretty much impossible to “measure” the effect of lossy compression using traditional means because the frequency response changes from moment to moment.
Listening tests are also useful for comparing loudspeakers because there are so many variables such as off-axis response, dB per octave low frequency roll-off slope, distortion that varies continuously with volume level, and separate distortion amounts for the woofer and tweeter. These can be measured perfectly well in a million-dollar anechoic chamber, but not so well at home.
Further, it’s probably impossible to measure the subjective effect of devices that add distortion or other color intentionally. Does an Aphex Aural Exciter sound better than a BBE Sonic Maximizer? Do plugin versions of vintage compressors sound the same (or at least as good) as the original hardware? Does a tape-sim plugin really sound like tape?
Only you can decide what sounds “better” to you, though you might be fooled, or at least biased, by various factors such as knowing whether you’re hearing a real guitar amp or a plugin simulation. So verifying your own perception is another use for a listening test.
Performing a proper listening test is a lot more complicated than many realize, and several conditions must be satisfied for the results to be valid. When the perceived differences are subtle – and even when they’re not so subtle – you can’t just play one thing, then another, and proclaim a winner.
The differences between modern high-fidelity audio devices are usually very subtle, at least when operated at normal levels to avoid distortion or noise. More often than not, listening tests have shown repeatedly that people are unable to tell one competent audio device from another no matter how much they differ in price.
First, and perhaps most important, a listening test must be blind. If the listener can see which device or source is playing, that will influence their opinion. Nobody is immune from sighted bias, and no reasonable person should object to being tested blind. If you’re so certain that you can tell HD audio from CD quality, then you should be able to do that when you can’t see which is playing.
Just as important, the audio sources being compared must be the same musical passage or sound sequence. You can’t compare devices by playing different parts of a song because the source sound itself changes! That makes it impossible to separate the source changes from any A/B device differences.
So it’s not valid to start playing a piece of music, then switch between A and B as the music continues. A section of music must be played through one device, then switch devices and play the same section again.
Further, you can’t just do this once because the listener has a 50-50 chance of being correct just by guessing. Therefore, at least five or six tests are needed – or even more – to be certain the listener really can hear a difference reliably. I’ll address the number of tests needed later in this discussion.
You can’t compare different performances either. A common mistake is comparing microphones or preamps by recording someone singing or playing a guitar with one device, then switching to the other device and performing again.
The same subtle details we listen for when comparing gear also differ between performances – for example, a bell-like attack of a guitar note or a certain sparkling sheen on a brushed cymbal. Nobody can play or sing exactly the same way twice or remain perfectly stationary. So that’s not a valid way to test microphones, preamps, or anything else. Even if you could sing or play the same, a change in mic position of even half an inch is enough to make a real difference in the frequency spectrum captured by the microphone.
The A and B volume levels must also be matched to within 0.1 dB, or as close to that as possible. Very small level differences often don’t sound louder or softer, but just slightly different. Larger volume differences have a substantial affect on perceived response due to Fletcher-Munson, where both low and high frequencies become more prominent at louder volumes.
You can match volume levels through electronic devices, such as preamps and equalizers, using a 1 kHz sine wave (click here to download a .wav file from my website). A decent voltmeter is also needed, though a recorder with a large analog VU meter is a reasonable second choice. One (1) kHz is standard for audio testing because it’s in the middle of the audio range to avoid the response errors many devices exhibit at the low and high frequency extremes.
However, when comparing acoustic sounds in the air, a better source for calibrating loudspeaker or microphone levels is pink noise that’s been band-limited to contain only midrange frequencies. Acoustic waves in a room create numerous peak and null locations only inches apart. The advantage of noise is that it contains multiple frequencies, so if your microphone is in a null at 1,000 Hz it’s probably not in a null at 900 Hz or 1,100 Hz. And as with sine waves, low and high frequencies are best avoided where loudspeakers and mics are further from flat.