Did You Say Rock Or Wock? Native Language & Speech Intelligibility Problems

Possible Solutions
It is unclear whether this language based influence on speech intelligibility is noise dependent or if in any environment there is a fundamental inability to understand some of what is being said due to a lack of phoneme distinction. In either case, some thought and analysis can yield better speech intelligibility criteria than those that exist currently.

One enhancement is to limit the number of words that are used in emergency announcements and familiarize the populace with this limited word list. This reduced emergency list would be an acoustical analog to the universal symbol for choking.

This would prove to be a tremendous benefit due to our inherent ability to mentally insert phonemes that have been masked by a noise when we are familiar with the context or perhaps even already aware of the available words. This ability is known as verbal auditory induction.

“Verbal auditory induction (phonemic restoration) employs contextual information of speech in determining the identity of the missing sound. The restored phoneme is indistinguishable to the listener from those physically present…the apparent position of the extraneous sound can be made to drift forward or backward in the sentence, although its exact location remains unclear.” (Contemporary Issues in Experimental Phonetics, p. 412)

In other words, this mental insertion of a masked phoneme is so natural; that the listener is unable to identify which phoneme had been masked. The synthesized phoneme is inserted and recalled as if it were actually heard.

As a result of the noise immunity of a predictable message, we see that experimentally, the intelligibility indices increase dramatically. “The precision with which listeners identify speech elements is intimately related to the size of the vocabulary and to the sequential or contextual constraints that exist in the message. The percent correct is higher the more predictable the message, either by virtue of higher probability of occurrence or owing to the conditional probabilities associated with the linguistic and contextual structure.” (Speech Analysis, p. 303)

In addition, “…as vocabulary size increases, the signal-to-noise ratio necessary to maintain a given level of performance also increases.” (Speech Analysis, p. 304)

In general “… speech perception … is a process in which the detection procedure probably is tailored to fit the signal and the listening task. If the listener is able to impose a linguistic organization upon the sounds, he may use the information that is temporally dispersed to arrive at a decision about a given sound element. If such an association is not made, the decision tends to be made more upon the acoustic factors of the moment and in comparison to whatever standard is available.” (Speech Analysis, p. 306)

It would seem that – regardless of the native language of the listener – if the message possibilities were known and anticipated, then intelligibility would be better. This would be particularly true if the message was specifically chosen to utilize the most common phonemes that are readily recognizable and distinguishable in most languages (perhaps even only those languages that are present in a particular ethnic cross section of a region where a public building is constructed).

For example, if at a public assembly a warning signal were sounded to alert those in attendance that important instructions were to follow, and if the possible message choices were limited to perhaps three that had been made familiar to the audience earlier, then the chances that all would comprehend the message would go up dramatically.

This simple system could readily be employed and maintained consistently over a given geographical area, however a national solution is certainly more practical.

Additional methods that can be utilized to improve intelligibility involve improving the signal to noise ratio. The best way to do this consistently is to increase the signal level without distortion. This could be implemented as an adaptive filtering scheme that adjusts the equalization of the system real time to be optimized for the given message.

This optimization would involve the elimination of most of the low and high frequency acoustic output while focusing on the vocal range. The level over this band could be increased and if appropriate devices had been chosen, the system could reproduce the message quite a bit louder than the level at which it is run for normal playback.

Conclusions
It is apparent that this phoneme phenomenon deserves additional attention, particularly in light of the ever-increasing ethnic diversity that we experience in these United States. The problem is understood, but its extent is not readily quantifiable. It would seem as if it possesses the potential to be devastating.

Additional speech intelligibility research must be done with diverse subjects, and methods of counteracting the effects of non-native speaker speech intelligibility degradation should be developed and employed to give further guarantee that the safety of the general public is the principal goal of a successful sound reinforcement system.

Jeff Rocha is vice president and general manager for EAW.

References
Beranek, Leo L.: Acoustics. Acoustical Society of America by American Institute of Physics, 1986.
Flanagan, James L.: Speech Analysis Synthesis and Perception. 2nd Edition. Springer- Verlag, Berlin, Heidelberg, New York, 1972.
Fry, Dennis.: Homo Loquens: Man as a talking animal. Cambridge University Press: Cambridge, London, New York, Melbourne, 1977.
Lass, Norman J., Ed.: Contemporary Issues in Experimental Phonetics. Academic Press: New York, San Francisco, London, 1976.
Lathi, B.P.: Modern Digital and Analog Communication Systems. 2nd Edition. Holt, Rinehart and Winston, Inc. Philadelphia, Fort Worth, Chicago, San Francisco, Montreal, Toronto, London, Sydney, Tokyo, 1989.
Olson, Harry F.: Acoustical Engineering. Professional Audio Journals, Inc.: Philadelphia, Pennsylvania, 1991