Study Hall

Keeping it Real II: In-Ear Monitoring And The Acoustic Reflex Threshold

Considering some of the complex mechanisms performed by the human brain that affect localization and our perception of loudness.

Editor’s note: This is part two of a three-part series. Part one was featured in LSI October 2018 and can also be found on ProSoundWeb.

Have you ever noticed how you and the band can take a break from rehearsing, come back half an hour later, and when put in your in-ear monitors (IEMs), everything feels louder? And then how after a few moments it settles down and feels normal again?

It’s because of a reflex action of the stapedius muscle in the middle ear. When this little muscle contracts, it pulls the stapes or “stirrup bone” slightly away from the oval window of the cochlea, against which it normally vibrates to transmit pressure waves to be converted into nerve impulses. This action, which is a response to sounds between 70 to 100 dB SPL, effectively creates a compression effect that results in about a 20 dB reduction in what we hear.

However, the muscle can’t stay fully contracted for long periods, so after a few seconds, the tension drops to around 50 percent of the maximum. While the initial reaction, at 150 milliseconds (ms), is not fast enough to fully protect the ear against very loud and sudden transient sounds, it helps in reducing hearing fatigue over longer periods.

The middle ear, including the location of the stapedius muscle.

Interestingly, this reflex also occurs when a person vocalizes, which helps to explain why a singer’s IEM mix of the band might sound loud enough in isolation, but when they start singing they find that they need more instrumentation.

It happens in conjunction with the fact they are hearing themselves not only via the mix but through the bone conductivity of their skull. It’s well worth trying to sing along to an IEM mix that you’ve prepared for a singer to experience what this feels like for them because it’s a very different sensation from simply shouting down the microphone to EQ it.

The acoustic reflex threshold also means that transients appear quieter than sustained sounds of the same level, and it’s the thinking behind a compression trick that is often used in studios and film production.

When you compress the decay of a short sound such as a drum hit, it fools the brain into thinking the drum hit as a whole is significantly louder and punchier than it is, although the peak level – the transient – has not changed.

I advocate caution if you’re going to try this in a monitor mix – drummers need to hear what their drums actually sound like, and getting things such as drum tuning and mic placement correct at the source are vital – but it’s an interesting thing to be aware of.

All In The Timing

Our ability to perceive sounds as separate events is not only dependent on there being a sufficient difference between them in frequency, but also on timing. This phenomenon is known as the “precedence effect”or the “Haas effect.”

These effects describe how when two identical sounds are presented in quick succession, they’re heard as a single sound. This perception occurs when the delay between the two sounds is between 1 to 5 ms for single click sounds, but up to 40 ms for more complex sounds such as piano music.

When the lag is longer, the second sound is heard as an echo. A single reflection arriving within 5 to 30 ms can be up to 10 dB louder than the direct sound without being perceived as a distinct event.

In 1951, Helmut Haas examined how the perception of speech is affected in the presence of a single reflection. He discovered that a reflection arriving later than 1 ms after the direct sound increases the perceived level and spaciousness (more precisely, the perceived width of the sound source), without being heard as a separate sound. This holds true up to about 20ms, at which point the sounds become distinguishable.

This can be an interesting experiment to try with a vocal mic and your IEMs. If you split the vocal mic down two channels, and delay one input somewhere between 1 and 20 ms, evaluate what you notice. Then try panning one input hard left and the other hard right to discover see how the vocal sounds thicker and creates a sense of width and space.

Play with the delay time, and you’ll see that if it’s too short the signal starts to phase; too long and you lose the illusion. This game does make the signal susceptible to comb-filtering if the inputs are summed back to mono, especially at shorter delay times, so be aware of that.

Once again I recommend extreme caution in using this in an IEM mix, as “tricking” a singer in this way can backfire!

However, it’s a useful principle to be aware of if you have the opportunity to get creative with other sounds, and I use it a lot when adding pre-delay to a reverb. No pre-delay creates a feeling of immediacy to the effect, but just 5 to 10 ms creates a slight sense of space. If you’re after a little more breathiness and drama – “vampires swirling” as I once heard it described – try increasing the pre-delay up to 20 ms and listen to how it changes.

The Haas (or precedence) effect. The ear will focus to the direction of the sound that arrives first and will not focus to the reflections, providing they arrive within 30 ms of the first sound. The reflections arriving before 30 ms combine with the perception of the first arrival.

The Haas effect is also something to be very aware of when IEM mixing due to digital latency.

Every time we take a signal out of the console and send it somewhere else in the digital domain, a degree of minor time delay known as latency is introduced.

Different processing devices introduce different amounts of latency, and obviously the less, the better. The more devices we add, the more the latency stacks up.

While a few milliseconds of latency may be totally imperceptible for, say, a guitarist, it’s a different matter when it comes to vocals.

Singers will often be able to perceive something as being not quite right without being able to put their finger on it, because when we vocalize and have that signal returned to our ears, the discrepancy between what we hear at the moment of making the sound–and the moment of it returning–becomes heightened in our awareness. This is something to be vigilant about when dealing with any digital outboard, such as plugins, with singers.

Location Services

The Haas effect also impacts where we perceive a sound to be coming from – the supposed location of the source is determined by the sound which arrives first, even though the sounds may be from two different physical locations. This holds true until the second sound is about 15 dB louder than the first when the perception of direction changes.

Sound localization is a very complex mechanism performed by the human brain. It’s not only dependent on the directional cues received by the ears, but it is also intertwined with the other senses, especially vision and proprioception.

Our ability to determine a sound’s location and distance is called binaural hearing, and in addition to all the psychoacoustic effects discussed so far, it’s also heavily influenced by the physical shape of our heads, ears, and even torsos.

The outer ear or “pinna” functions as a directional sound collector that funnels sound waves into the ear canal. The head and the topography of our face and torso influence how sounds from any position other than a 0-degree angle are heard, as they create an acoustic “shadow.”

Our brains process the differences between the information that our two ears collect and interpret the results to determine where a sound is coming from, how far away it is, and whether it’s still or moving.

At lower frequencies, below about 2 kHz, this is mostly determined by the inter-aural time difference; that is, the discrepancy in time between when the sound reaches each ear. Above 2 kHz the information gathered comes from the inter-aural level difference; that is, the discrepancy in volume between the sound that each ear hears. This clever evolutionary adaptation is due to the relative lengths of sound waves at different frequencies.

For frequencies below 800 Hz, the dimensions of the head are smaller than the half wavelength of the sound waves so that the brain can determine phase delays between the ears. However, for frequencies above 1.6 kHz, the dimensions of the head are greater than the length of the sound waves, so a determination of direction based on phase alone is not possible at higher frequencies; instead, we rely on the level difference between the two ears. These binaural disparities are known as Duplex theory and play an important role for sound localization in the horizontal plane.

Finally, if the frequency drops below 80 Hz it becomes difficult to impossible to use either time difference or level difference to determine a sound’s lateral source because the phase difference between the ears becomes too small for a directional evaluation, hence the experience of sub-bass frequencies being difficult to localize.

Making Distinctions

While this phenomenon makes it easy to sense which side a sound is coming from, it’s harder to determine direction in the up/down and front/back planes due to our ears being placed at the same horizontal level as each other. Some types of owls have ears placed at different heights to allow for greater efficiency in finding prey when hunting at night, but humans have no such facility.

This can result in “cones of confusion” where we’re unsure as to the elevation of a sound source because all sounds that lie in the mid-sagittal plane have similar interaural differences; however, once again the shapes of our bodies help us out. Imagine a sound source is right in front of you. There’s a certain detour the torso reflection takes and hence a certain difference of this torso reflection in relation to the direct sound arriving at both ears.

It yields a slight comb filter pattern that will change if this source is elevated. The same is true if this source is now moved behind us; the torso reflection changes and our brains process the information discrepancies to help us locate the source.

In the final installment of this series, we’ll look a ground-breaking new technology that takes IEM mixing to a whole new dimension.

Study Hall Top Stories