Study Hall

Supported By
ProSoundWeb

Keeping It Sonically Simple: A “KISS” Approach To Capturing Quality Narration Sound

There are four basic (yet key) areas of microphone technology and techniques to consider – let’s call them Pattern, Position, EQ and Processing.

Recently, I’ve been asked to improve the audio of several customers who are new to webcasting and Zoom meetings. At first glance, one would think this is a super-easy task. Just stick a mic in front of the talent and go for it.

However, for many first-time webcasters who find themselves doing their own production, the proper set up and processing of a single narration microphone can a complete mystery. After all, there are so many choices. Go with a USB mic or one with an XLR output (or both)? If it has an XLR just what do you plug it into? Just what are those “Cloudlifter” devices?

Further, there are so many processing plugins available, how many of them do you need to “plug in”? The equalization possibilities are confusing and infinite, and how about all those dynamic compressors? And shouldn’t reverb be added to everything? Topping it off, that fancy mic you just paid a lot of money for? It probably has several switches with cryptic markings.

All folks want is decent sounding audio, but there are so many options that they can easily find themselves going down a rabbit hole.

Breaking It Down

There are four basic areas of microphone technology and techniques to consider. Let’s call them Pattern, Position, EQ and Processing.

But first things first – it all starts at the microphone… A quality model, set to the right polar pattern and positioned properly can take us 90 percent of the way to getting quality narration sound.

Polar patterns are important for one simple reason. You probably don’t have a perfectly quiet room free of any reflective surfaces. So, the goal is to choose a microphone that picks up sound from one direction while ignoring the sounds from all other directions.

Microphone polar patterns ranging from omnidirectional on the far left to bi-directional on the far right.

The most useful mic pickup pattern for vocal narration is the basic cardioid. Nothing fancy here, plus it also creates a bit of a proximity boost on male vocals when they’re within six inches or so of the mic. Don’t believe me? Listen to what the omnidirectional mic built into your USB camera sounds like.

My favorite mic for webcast narration is the Shure MV7. It’s a dynamic design with a great pop filter plus a standard cardioid pickup pattern. It can be plugged directly into a USB port of a computer (which is what I typically do), or its XLR output can be plugged into the mic preamp of choice.

The Shure MV7 dynamic mic with companion ShurePlus MOTIV app.

It also includes a very robust headphone amp, so users can hear what they’re doing while on headphones. Finally, the company offers the free ShurePlus MOTIV app that allows tailoring the frequency response and dynamic compression without need for additional in-line or post processing.

Multi-pattern mics are generally studio types with switchable patterns – omnidirectional, wide cardioid, cardioid, supercardioid, and figure-8 (a.k.a., bi-directional). Figure 1 shows the switch on the iconic AKG C 414. It certainly wouldn’t be a good idea to select the omni pattern on the far left because it will pick up a lot of room reflections as well as things like computer fan noise. Meanwhile, the bi-directional pattern on the far right will pick up everything directly behind the mic as well as the narrator’s voice.

The AKG C 414 is equipped with a switch for changing polar patterns.

In my view, the middle switch selection of standard cardioid pattern will work best in capturing a voice while minimizing other noise. From there, give it a little bass boost (particularly for many male voice signatures).

Many less expensive multi-pattern mics have a polar pattern switch on one side, most often omnidirectional, cardioid, and bi-directional. These mics also usually offer a bass roll-off and -20 dB pad, and unless something is being recorded really loud (like a screaming voice, electric guitar cabinet or snare drum), don’t engage the pad since it will likely increase the noise level instead of stopping internal distortion from loud sound sources.

Getting Into Position

Don’t assume that sticking the microphone anywhere in the room is going to work. (I blame Captain Kirk from Star Trek for being able to speak from the big chair with no microphone in sight – with zero echo or room noise.)

The optimum mic position is about four to six inches from the mouth, and off to the side a bit so the narrator doesn’t blow directly on the mic, which can create p-pops (plosives). I like to position four fingers between my lips and the mic for consistency and to attain plenty of level. I also place it an inch or two below my lip level and angled up a bit, and at a 30- to 45-degree angle off-mic. (The photo that opens this article provides an example of good positioning.)

This is the standard radio DJ microphone position, which has worked well for many decades. Not only does it remove p-pops, but it also allows narrators to easily watch themselves on a video monitor and read their script while allowing the webcasting audience to see their face. All good!

If you need to get the mic any further away and/or are dealing with a very soft voice, then an XLR mic may require the aforementioned Cloudlifter (from Cloud Microphones), especially when plugging into an inexpensive mic preamp. This device is basically a very low-noise pre-pre-amp that boosts the level and will help reduce any hiss (white noise) that can show up at really high gain levels.

The Cloudlifter CL-1 can add provide up to +25 dB of gain.

In my own experience, I’ve never needed one for XLR mics I use since I have a pretty loud voice and can keep the mic within six inches of my mouth, plus I’m told I have perfect enunciation and pronunciation from 12 years of parochial school (thank you, Sister Charles).

Go Gently

With a quality mic in the correct position, then only minor equalization should be needed. It’s highly likely that there will need to be a cut to the bass frequencies below 80 to 100 Hz to help reduce any HVAC rumble in the room as well as attenuate those pesky p-pops. And if the mic has a pop-filter, use it; if not, add a simple windscreen.

If the application is going to be live webcasting (as in Zoom meetings), you can often select a bass roll-off on the mic itself. This usually looks like a little hockey-stick selection. Unless the narrator is doing Tibetan Throat Singing, there’s really no useful sonic energy below 100 Hz from most male voices, and it can be even higher (150 to 200 Hz) for typical female voices.

In addition, some higher end mics (like the Shure SM7-B) also has a presence/high-frequency boost selection, which creates a 5 dB boost starting from 1 kHz up to 8 kHz. This is exactly the correct part of the spectrum to improve vocal clarity without introducing too much sibilance.

Basic vocal EQ for dynamic mics.

If the voice is being recorded for later webcasting, there’s an infinite number of equalization possibilities. But don’t think you need to use everything at the same time – simple is better. Cut frequencies as much as possible below 100 Hz, and particularly if it’s a dynamic mic, boost from 3 kHz to 8 kHz. (Most condenser models already have this 8 kHz boost, so it won’t be necessary to add it twice in the EQ strip. The general rule is to do no more than 6 dB of boost at any frequency since it can likely result in distortion/clipping trouble.

The bottom line is that if the mic is already providing low-cut and high-boost capability via switches, there’s no need to replicate either one with the channel EQ. Choose one or the other, but not both mic EQ and channel EQ. How much EQ is correct? Use your ears to compare test voice test recordings, and also compare them with samples for other quality webcasting sources.

What’s This Dynamics Thing?

We used to call it compression, but that was often confused with data compression (as in MP3 conversion). And let’s not confuse level dynamics with dynamic and condenser mic elements. (That’s an article all in its own.)

What we want to do is limit the dynamic range (difference between the loud and soft levels of the voice) without all the audio pumping artifacts that are too easy to make and sound terrible.

I teach my students that there are two basic controls in any dynamic compressor that should be understood. The most important control is the threshold level, which is very much like the height adjustment for the blade of a lawn mower (really it is). Just like mowing, any grass (or audio levels) higher than the blade height (threshold level) will be chopped off, while any grass (or audio levels) shorter than the blade height (or threshold) will be left alone and not cut.

The key to setting this control is to start at the highest setting, then start moving it down a little at a time until you’re just cutting off the top of the grass (loud audio parts) without making a bald spot in the audio landscape. You can see how much grass (audio level) is being cut by watching the GR (gain reduction) meter on many dynamic plugins or apps. Aim for, at most, 6 to 10 dB of gain reduction between loud and soft speaking parts.

Basic vocal dynamic compression.

The other key control is the adjustment for compression ratio. Generally, somewhere between 3:1 and 4:1 is a good choice for narration vocals. Make it too high (as in 8:1 or higher) and you’ll squash all the speech dynamics, making it sound like a robot. Make it too low (as in 2:1 or lower) and it won’t control the peak sound levels.

In many dynamic processors, there’s also the Knee, which helps in softening the impact of the compressor and makes it less noticeable. Now, if you really prefer the sound of a compressor hitting the voice hard (as in metal vocals) that’s OK, but I’ve found a hard knee is too jarring for stand-alone vocal narration. As always, let your ears be the guide, and do comparison listening to other quality sources.

Wrapping It Up

A word of caution: don’t be tempted to use a “bit-cruncher” app on a narration voice. And unless you’re doing a spooktacular Halloween effect, don’t be tempted to add any reverb or echo. While they can be cool effects for a singing voice with music around it, they really don’t work well for podcast, webcast and online meeting room applications.

My first rule for all processing is the Hippocratic audio oath of, “First, Do No Harm.” Select the right pattern for the mic, set it correctly, and use the most minimal amount of processing possible. Less is more, so if you can hear the processing, it’s probably a good idea to back it down a little bit. Remember, you’re trying to get the message across, not show off the latest Godzilla processing app.

Study Hall Top Stories