Thursday, September 02, 2010
Audio Basics: The Perception Of Sound - A Primer
A comprehensive guide covering the basics of what all engineers should understand of the perception of sound and how that knowledge to better their craft.
The goal of this guide is to help develop knowledge of basic acoustic principles.
In turn, this will help you to understand, and eventually master, the basic techniques of sound engineering and recording.
Each section has a theme that is first defined in technical terms and then further explained in practical terms with respect to audio.
timber
timber (pronounced /tam-ber’/) is a sound’s identity. This identity depends on the physical characteristics of the sound’s medium (the matter or substance that supports the sound).
Let’s take an A at 440 Hertz produced at 60 decibels: we can immediately tell if the sound was emitted from a violin, saxophone, or piano.
Yet, even though the instrument is different, it’s the same note and the same amplitude. The difference is in the sound production: string, air column, etc..
Plus, the sound isn’t generated by the same “tool”: a bow for violin strings, a reed and an air column for the sax, and felt covered hammers that strike the piano strings.
It’s the different physical characteristics of the medium and the « tool » that determine the characteristic sound waves in each case. Later we will also see how a sound chamber adds another dimension to this definition.
Waveform
The most basic waveform is a sine wave (sinusoid) (fig. 1). It could be considered the atom of sound. Pure sinusoidal sounds are rare (tuning forks, drinking glasses being rubbed) and were considered to have strange powers over human behavior at one time. Most sounds that surround us are of a more complex nature.
This means that inside a sound, that we perceive as being unique, there is a superposition of many sine waves that have, in a way, fused together to become one sound.
It’s the nature of this superposition itself that determines the resulting waveform (fig. 2) and that is responsible for its timber. This is called a spectrum.
Spectral Representation
There are many ways of graphically representing sound. For instructional purposes we have chosen to use a spectrogram for its clarity and simplicity.
Horizontally: time in seconds. vertically: frequency in Hertz. A sine wave (sinusoid) at 100 Hertz is represented by a horizontal line at a height corresponding to 100.
A harmonic sound at 100 Hertz is represented by superimposed lines corresponding to sine waves of 100, 200, 300: n x 100 Hertz. The length of the lines represent the length of the sound.
Noise
Let’s imagine a case where all sine wave frequencies that are perceptible to the human ear (from 20 Hertz to 20 kHertz) and having the same amplitude, are “mixed” into one sound signal. We get what is called “white noise”, or in other words “hiss”. If the white noise is very short we would perceive it to be a kind of short percussive sound.
Consonants belong to this category, in the same way that a sound medium that receives the attack of the “tool” which “kick-starts” it, produces as noise.
This noise corresponds to the time it takes for the sound wave to stabilize and take its final form. The “rubbing” of a bow on a string is similar to a hissing sound, while a hammer hitting a piano string is similar to a percussive sound.
These notions will be dealt with in greater depth when we get to envelopes and transients. In the case where a series of noise frequencies is contained between certain limits we will refer to them as noise bands.
If a zone is particularly swollen in energy, then we can speak about colored noise around that zone. Pink noise is white noise with a power density that decreases by 3 dB per octave.
Harmonic Sounds
Having already highlighted the superimposed or complex aspect of sound, we are now going to focus on a specific category of frequencies in a sound spectrum: harmonics.
A harmonic sound is a sound which contains sine waves that obey the mathematical law called the Fourier series. This law translates as follows: A complex periodic signal is made up of a certain number of component frequencies that are integers of the fundamental frequency.
An example of a harmonic sound: a sound at 100 Hertz in which the component waves are 100; 200; 300 ; 400 ; 500 ; 600 Hertz. The perceived pitch is the lowest frequency: 100 Hertz. The following component waves (2 x 100, 3 x 100, 4 x 100, etc.) are calculated on integers and are called harmonics.
The lowest frequency, on which they are based, is called the fundamental. The number , or “rank”, of a harmonic is the integer by which the fundamental is multiplied. For example the 3rd harmonic would be the one at 300 Hz. fig.5
The pitch of a harmonic sound is easily perceptible to the ear, and these sounds usually have an “in tune” quality about them. That’s why melodic musical instruments are designed with the goal of producing harmonic spectrums.
Noises, like those we referred to earlier, are aperiodic signals. They are characteristic of percussion instruments for example.
Distribution Of Energy Across The Spectrum
Regions of relatively great intensity in a sound spectrum are called formants. In the case of a band of consecutive frequencies it is referred to as a formant zone between x and y Hertz.

This distribution of energy plays an important role in the perception of timber, as do the number of components in the spectrum, their distribution, and its regularity or non regularity.
- a- violin: a hiss noise at the attack, harmonic spectrum.
- b- flute: harmonic spectrum.
- c- piano: noise of the hammer attack, percussive sound and spectrum not quite regular in its harmonics.
- d- warm sound: few harmonics but regular distribution of the energy from low to high.
- e- piercing sound: harmonic sound with a lot of intensity in the highs.
- f- Hollow sound: few harmonics in the mids.
- g- nasal sound: weak lows, intense mids, weak highs.
- h- non harmonic sound: like a non-tuned bell.
- I- square signal, odd harmonics: like a clarinet sound.
EQ
It’s the EQ section of a console that will allow us to tweak or correct timber. Depending on the model, the EQ section is more or less sophisticated and offers different possibilities of adjustment.
We won’t be dealing with simple high/low EQ knobs or switches that you can find on hi-fi amplifiers or entry level mixers which are only meant to adapt a sound to a specific listening area.
We’re more concerned with the EQ controls that are found on small modern digital models or part of most major recording software.
We must keep in mind that EQ is mainly used for one reason…to correct, and not in the hope of improving the recorded signal: you can never turn a mediocre recorded sound (due to bad placement of the mic or even the quality of the mic itself) into a great sound by just using EQ.
Equalizers split the audible frequency range( 20 Hertz to 20 kHertz…) into many sub-ranges.
Thus one generally talks about highs, medium highs, low mids, and lows. The first thing to do, then, before tweaking any knobs, is to determine in which frequency range the problem lies, then after that, the nature of the problem.
Is it due to too much coloring that wasn’t detected during the recording process, a parasite due to the environment, or a masked effect due to the presence of other instruments…
What Does It Look Like?
Equalizers are…harmonic and partial filters. Their specificity lies in the fact that they not only can get rid of component frequencies, but that they can also amplify chosen frequency zones.
Of course, if there isn’t anything in the signal in that range, only hiss will be added!
Good EQ sections generally have 4 bands. Each offers at least 2 controls: frequency adjustment and gain.
These are called semi-parametric. There’s often a third setting called the bandwidth or “Q” which has the purpose of enlarging or tightening the frequency range (bandwidth) of the filter.
When this 3rd control is present, the Equalizer is then called a parametric equalizer. Frequency adjustment will be tweakable between the upper and lower limits of the sub-range of the filter (with software these limits no longer exist!)
The gain knob defines, in dB, how much the filter will effect the chosen frequency. As we can see here in fig. 8, borrowed from cubase, this gain can be positive or negative.
We can also see that the curve of the bandwidth can be wider (hump shape) or narrower (peak shape). This shape corresponds to the bandwidth which is adjusted by the Q setting.
How To Modify timber
You must always keep in mind that all EQing on an instrument will be destructive with respect to the recorded sound, just as the latter is also, in many cases, an imperfect copy, of the original. So one must be careful!
Before touching anything, think about what you want to accomplish with EQing: I want a “warmer” sound, I want to cut the bass, I want my instrument to stand out in the mix, I want to get rid of that annoying resonance that came from the studio…
The spreadsheet below is offered to you as a kind of “quick guide” chart. It will serve as a check-list that will enable you to control and master your timber EQing. Don’t forget, however, to listen: your ears are the ultimate judge.
Advance timber Topics
We’ve discussed a “physical” definition for timber, stressed its “plural” aspect, and discussed the elements which make up and define it.
Above all we discussed the infinite combinations between these spectral components.
Now lets further this analysis by focusing on how different types of parameters such as intensity, duration, and space influence our perception of timber.
In fact, these variables are not isolated from each other: by changing one, we affect another either in a real, tangible way (which means that it can be physically measured) or subjectively ( perceived by our ears and our brains).
Perceiving timber as a whole is the result of our brains assimilating and evaluating all psycho-acoustic parameters.
The Stevens Effect
This experience concerns simple sounds (sine waves). On a given and constant frequency, for example 5000 Hz, intensity will be increased by about 40 to 90 dB.
The listener honestly thinks that the sound has become higher, about 40 Savarts, which means around a whole step (a “tempered” whole-step = 50 Savarts).
This is due to a virtual frequency variation, which only affects the human ear. For frequencies below 1000 Hz, it’s the opposite. You’d have the impression that the sound gets lower when it’s intensity increases (See Fig. 10). This is called the Stevens effect.
Consequences Of The Stevens Effect
A real change in the intensity of the spectrum’s components creates a subjective variation in timber (tone).
In the case of a complex signal, in other words the majority of sounds that surround us, when there’s a progressive variation in the intensity of a sound, each component will undergo its own Stevens effect.
You can see how our ears erroneously perceive the components of a sound when there’s a trumpet crescendo for example.
Can we still speak about harmonic spectrums if certain frequencies make us believe that they are higher or lower than they really are? For our ears there’s in fact a sensation of a real timbrel change when intensity increases.
It can therefore be deduced that all complex signals subject to a variation in intensity will be perceived as having had it’s timber altered.
The Role Of Formants
In part I, we saw that a formant is a frequency of the spectrum which is particularly strong in terms of energy.
Some wind instrument players and classical singers use this fact to gain power (or at least to give you that impression) without really producing more energy. Trying to do so would be really difficult on their lungs.
How does this illusion work? Well, it’s a little bit like the trick-question about which is heavier; a ton of feathers or a ton of lead? The instrumentalist works on his sound (timber) to give the impression that they’re increasing intensity.
Thanks to his/her technique, he/she will change the energy in the sound’s spectrum, concentrating it more specifically around 3000 Hz, there where our ears are more sensitive and react to the most weak intensities.
The sound is perceived as being stronger by the listener, while on a console, we’ll just see a small change on the vu-meter.
Real Modification Of timber & The Sensation Of Frequency Variation
When one part of the spectrum is filtered, which can happen on stage because of an obstacle, you can a get the unpleasant impression that the musician is out of tune: a little sharp or a little flat.
This phenomenon is due to the absorption of a frequency band of the spectrum by the obstacle. This “hole” in the timber can be enough to make us believe that the frequency has changed.
This effect is noticed in the case of separated sounds, which is the case of music in general. It will be more flagrant if the spectrum of the instrument isn’t very, or not at all, harmonic: for example bell sounds or a xylophone.
This phenomenon doesn’t happen when there’s a continuous harmonic sound. Our memories remember the spaces between the harmonics and the in-tune aspect is kept. Only changes in timber, due to filtering, will be perceived.
timber & Duration: The Role Of Transients
Attack: At a later time we will deal with duration, where envelope curves will be discussed in detail. But for the present it’s difficult not to speak about the notion of the evolution of timber in time and not at least stress the importance of the nature of the attack in the determining of the spectrum that follows.
Everyone knows that the tool used to generate a sound releases a certain number of components. A cymbal “attacked” with a drumstick or brushes won’t sound the same.
Sustain: Only synthesizers are able deliver a signal that’s perfectly stable in its timber for the duration of the sound. This is exactly what people don’t like about “samples” that tend to lack contrast. When dealing with acoustic instruments, timber is constantly evolving, with a certain amount of unpredictability due to physical and human aspects of playing technique.
Release: The place in which sound is captured, as well as the listening space, effects timber. We’ll dedicate some articles to architectural acoustics at a later time.
For the moment, we’ll mention that a room’s (or place’s) acoustics (or reverberation) modifies “release” and delays or shortens the time it takes for components to disappear. timber is thus either “dilated” in time or “shortened”. timber from the same source can be altered depending on the listening space, due to the fact that attenuation isn’t linear in frequency.
Law Of timber Lost According To Distance
If you hear an orchestra outside in the distance, you will first perceive the bass.
Then as you get closer, you’ll hear the mids and once your close enough you’ll hear the highs. This phenomenon comes from the fact that each frequency travels at a different speed depending on the speed of sound and the wavelengths.
Depending on how far you are from the sound source, timber is altered. The fact of getting further away is associated more with a « lack » of highs, than a decrease in the sound’s signal. A distant sound can thus be recognized by its timber.
Distortion: When you measure an amplifier’s efficiency, one of the things that is measured is Total Harmonic Distortion (THD). When a signal passes through a non-linear device, additional content is added at the harmonics of the original frequencies. THD is a measurement of the extent of that distortion.
This value is expressed as a percentage (%) and represents the quantity of undesirable content (harmonic frequencies of the signal, noise, parasites, etc.) that are added to the signal at the device‘s output. The higher the value, the lower the quality of the device.
Yet, musicians sometimes seek distortion out: many guitarists tend to love distorted sounds and distortion pedals.
If you pump a sine wave into a device at a greater level than it can take, you’ll saturate the input and create a type of musical distortion: the energy that’s lost in amplitude will transform itself into harmonic components and will enrich the timber (Fig. 13).

Figure 13 shows the result, with an incoming sine wave. If we apply a complex sound wave, the increase in richness is much greater.
The Impact Of timber On Signal Level
We’ve already seen that if we move a signal’s energy zones towards the zone to which our ears are sensitive, we get the impression that the signal has intensified.
But…when you EQ you add or take away real electrical energy that has a real effect on the signal’s level. You must therefore keep an eye on the input gain level, which you might have to adjust in order to avoid clipping.
However, you could also bring a certain sound in a mix closer by simply increasing one of the EQ levels slightly (more likely in the mid-highs…).
Changes In timber Perception Due To Distance
As we’ve said, you can make a signal more present in a mix by adding highs to it. This is, as we’ve said before, because high frequency harmonics die out sooner.
We can then deduce that a distant source will have a more muted sound then the same source which is closer.
This is one way of mixing that favors realistic parameters by altering timber instead of levels: the foreground will be more brilliant while the background will be less brilliant.
This mix’s realism will benefit from keeping a homogenous sound that would not have been achieved by just adjusting faders .
Time In Acoustics
With stopwatch in hand, our perception of time seems straightforward. But in everyday life we’re not always watching the clock, and everyone knows that the passage of time is relative.
It differs from one person to another and especially from one activity to another: An hour spent watching a great movie doesn’t feel as long as an hour in traffic.
Scientists may conceive time in seconds, but most musicians feel it in a more fluctuating manner: either in the speeding up of a tempo or the slightly off-pitch note due to stress.
In fact, pitch, which is defined by frequency, is a value linked to time and depends on our perception of a second. If it seems longer or shorter, the note can seem sharp or flat.
It’s said that during the middle ages, long before the invention of the metronome, a person’s pulse was used as the reference. It was therefore better to choose a musician who was calm.
An Experiment
For those of you who can remember magnetic tape, a piano note played in reverse doesn’t sound at all like a piano, and a verse of Shakespeare in reverse sounds strangely like…Swedish.

In fact, what our ears perceive as a single homogenous sound is really like a small train made up of four different cars: if we watch it as it moves forwards or backwards, the order of arrival won’t be the same and therefore our perception of the sound will be different.
It’s this idea that’s expressed through the notion of the A.D.S.R curve, also called envelope curve. A « reverse » preset found on some reverbs manipulates nothing but the reverb envelope. It will probably be a decreasing sound and look like figure 14. If it’s played in reverse, the end will therefore be played before the beginning (figure 14b).
A.D.
For a sound to take place, you need two agents:
- An exciter: it brings in energy.
- The excited element: that which receives the energy and starts to vibrate, creating the sound wave.
For example, in the case of a violin, the exciter is the bow, the excited element is the string. For a drum, the exciter is the drumstick, and the excited element is the skin. Each phase of the A.D.S.R. will measure the rapport time/energy of each of the four phases.
A For Attack
The Attack is the transfer time of the energy as it passes from the exciter to the excited element. The importance of the Attack is fundamental for all instruments especially percussion.
In the case of a piano (a little particular), a big part of the instrument’s timbral identity is determined precisely by the type of attack, in other words, by the player’s technique.
Wind players have also learned to develop tonguing techniques by using, as the name implies, their tongues which let them create percussive-like attacks.
When recording a voice, sometimes you get a transitory that’s too loud, either because the vocalist sings too loud, or because when speaking , a speaker clips the mic by hitting the occlusive consonants (d t p).
A compressor lets you limit the damage, but an anti-pop filter in front of the mic will more than likely fix the problem… The Kleenex in front of the mic myth, wasn’t always very effective, but we can do better now.
D For Decay
Decay could be defined as the time a signal takes to stabilize itself; we can represent it as being the difference between the initial energy of the attack and that used to maintain it.
With an instrument with a non-sustained sound, in other words, without a sustain phase, the difference between decay and release isn’t necessarily noticeable…The last phases continue without us being able to perceive them (fig. 16).
S.R.
S For Sustain
It’s the period during which energy is maintained.
There are two possibilities: that in which the sound is maintained like in figure 15 (for example, wind instruments where you have to continue blowing in order to keep the note), and a dying sound (a drum, once the skin is struck, the sound ,will last as long as the energy accumulated by the skin holds)
The first category creates harmonic spectrums, the second, non harmonic spectrums. (see article on timber).
R For Release
Once there’s no more energy coming in, the sound dies down until it extinguishes itself. For non-sustained signals, this phase just continues from the last phase. Release is rather complex.
It starts, in theory, once the musician doesn’t have any more control over the signal and finishes when the energy producing the sound is completely exhausted.
For a piano, the note continues to sound, once a key is released, if the sustain pedal is pressed and lets the strings vibrate naturally until they stop.
Plus, in a certain way we have to consider the fact that the room also effects decay. The same instrument, played in a small room won’t sound the same, in terms of decay, as it would in a cathedral. The resulting trailing effect from a long release might sound nice for certain synthetic sounds, but is difficult to control during recording.
Synthesize An Envelope
There are two specific cases for manipulating ADSR. The first brings us to the synthesizer.
The first analog models were designed to be able to adjust each section of a sound, in the image of acoustic instrument models.
In a synthesizer, the envelope filter (or EG, for envelope generator) controls a VCA (Voltage Controlled Filter) and subjects the control of the sound level to the ADSR settings.
This means, the oscillator produces a constant power signal that will be routed towards a VCA which will receive its instructions from the ADSR settings: the VCA gets modified in regards to the current that powers it.
This is delivered by the ADSR. The envelope in fig. 2 would correspond to someone progressively turning up the volume knob of an amp from zero to the chosen level (A), then reducing it quickly (D) until a second stable level (S), then progressively decreasing until zero (R ).
If you think about it, ADSR controls let us adjust the time it takes to go from A to D, then from D to S and finally, from R to zero, with the relative level of S (Sustain) coming from the difference between the end of the attack and the end of the decay.
A Little Bit Of History
This approach was for a long time considered sufficient, even if a certain amount of approximation had to be admitted: in fact, the absolute levels of A,D, and R aren’t modified; only the relative levels defined by the level of Sustain.
When the DX7 came out in the early 80s, there were finally 8 segment envelope generators that separated level from time (fig. 17).
Even if such a system has brought new possibilities in terms of adjustment capabilities and sound creation, transitory recreation through simple filters is still unsatisfactory. The digitalization of samples (Roland D50) permitted the use of real sampled attacks with synthetic sounds.
All manufacturers then used equivalent technologies on their instruments. These days, physical modeling lets us recreate an instrument’s original envelope and its interaction with the created sound , but ADSR parameters are still present to allow adjustment of the dynamic envelopes of the sounds.
Compression
The second case in which a musician-technician might find themselves confronted with having to manipulate an envelope generator: a compressor. A compressor usually has envelope adjustments that change the action time of the compression effect.
Depending on the gear, you’ll usually find an Attack adjustment, which corresponds to the time the compression kicks in once the signal reaches the limit of compression.
By putting this setting on slow, the compression will be much more discrete and lets you assure a certain amount of compression without it being too sensitive (for classical music for example).
But on the other hand, all sudden peaks corresponding to short attacks will escape treatment. A short Attack adjustment will allow the compressor to react instantly , but that typical compressed “punchy” sound will be heard. In today’s music this can be a desired and interesting effect, if used with moderation.
You can also modify the Release which adjusts the time it takes the compressor to bring the level back to its initial level. As with the attack settings, a middle setting will be more discrete and will be more delicate in bringing the level back down. The opposite, a release set to zero can, if the compressor intervenes often, give a disagreeable wave effect.
For more audio/sound related content and resources, go to AudioFanzine.
{extended}












