Sign up for ProSoundWeb newsletters
Subscribe today!

In-Depth Primer: Speech Intelligibility In Sound Reinforcement
If even a modest amount of info is garbled or missing, the brain can’t decipher the message...
+- Print Email Share RSS RSS

Section 1: Introduction

Most people have had this experience:

You’re driving along in your car, windows down and the radio playing. It’s a new song, one you’ve never heard before by an artist you don’t recognize, and you’ve got to get the name so you can buy the disc. The music ends, the announcer comes on and . . .

. . . you can’t understand him over the road noise.

As this simple example illustrates, there’s an important difference between music and speech. The brain is capable of “filling in” a fair amount of missing information in music, because there’s a high degree of redundancy. (If you didn’t get the bass line in the first four measures, you’ll pick it up when it repeats in the next four.) But speech is rich in constantly-changing information and has less redundancy than music. If even a modest percentage of the information is garbled or missing, the brain can’t decipher the message.

Speech communication systems therefore are subject to more stringent requirements than music systems. These pages discuss speech intelligibility in sound reinforcement - what it is, what affects it and how it’s measured.

The Speech Signal

Human speech is a continuous waveform with a fundamental frequency in the range of 100-400 Hz. (The average is about 100 Hz for men and 200 Hz for women.) At integer multiples of the fundamental are a series of changing harmonics called “formants” which are determined by the resonant characteristics of the vocal tract.

Formants create the various vowel sounds and transitions among them. Consonant sounds, which are impulsive and/or noisy, occur in the range of 2 kHz to about 9 kHz. (Below is a vocal spectrum graph for male and female speakers with an “idealized” human vocal spectrum superimposed.)

image

The sound power in speech is carried by the vowels, which average from 30 to 300 milliseconds in duration. Intelligibility is imparted chiefly by the consonants, which average from 10 to 100 milliseconds in duration and may be as much as 27 dB lower in amplitude than the vowels. The strength of the speech signal varies as a whole, and the strength of individual frequency ranges varies with respect to the others as the formants change.

Speech Comprehension

The listener’s challenge is to parse speech sounds into meaningful units of language - a complicated task. Gaps in the sound don’t necessarily correspond to word or syllable breaks. Speech sounds also are not discrete events: rather, they merge and overlap in time, and the articulation of a given phoneme differs in different contexts and with different speakers.

In fact, the precise ways in which the ear-brain mechanism decodes speech remain something of a mystery. Such factors as loudness, duration and spectral content certainly affect speech perception, but how they may interact is not fully understood.

Diminished intelligibility is associated with a loss of information that is coded in a number of highly interactive elements, and many factors influence it. Background noises can mask the speech. Both the direction of the source, relative to the listener, and the direction of the interfering noise can alter the degree of masking. Intelligibility is also affected by the predictability of the message, the speaker’s enunciation and, not least, the acuity of the listener’s hearing.

Go To: Section 1  Section 2  Section 3  Section 4  Section 5 


Commenting is not available in this weblog entry.





Sponsored Links