|
Native language and speech
intelligibility problems
by Jeffrey A. Rocha
|


1
2
3

|
Possible Solutions
It is unclear whether this language based influence on speech intelligibility
is noise dependent or if in any environment there is a fundamental
inability to understand some of what is being said due to a lack
of phoneme distinction. In either case, some thought and analysis
can yield better speech intelligibility criteria than those that
exist currently.
One enhancement is to limit the number of words that are used in
emergency announcements and familiarize the populace with this limited
word list. This reduced emergency list would be an acoustical analog
to the universal symbol for choking.
This would prove to be a tremendous benefit due to our inherent
ability to mentally insert phonemes that have been masked by a noise
when we are familiar with the context or perhaps even already aware
of the available words. This ability is known as verbal auditory
induction.
Verbal auditory induction (phonemic restoration) employs contextual
information of speech in determining the identity of the missing
sound. The restored phoneme is indistinguishable to the listener
from those physically present
the apparent position of the
extraneous sound can be made to drift forward or backward in the
sentence, although its exact location remains unclear. (Contemporary
Issues in Experimental Phonetics, p. 412)
In other words, this mental insertion of a masked phoneme is so
natural; that the listener is unable to identify which phoneme had
been masked. The synthesized phoneme is inserted and recalled as
if it were actually heard.
As a result of the noise immunity of a predictable message, we see
that experimentally, the intelligibility indices increase dramatically.
The precision with which listeners identify speech elements
is intimately related to the size of the vocabulary and to the sequential
or contextual constraints that exist in the message. The percent
correct is higher the more predictable the message, either by virtue
of higher probability of occurrence or owing to the conditional
probabilities associated with the linguistic and contextual structure.
(Speech Analysis, p. 303)
In addition,
as vocabulary size increases, the signal-to-noise
ratio necessary to maintain a given level of performance also increases.
(Speech Analysis, p. 304)
In general
speech perception
is a process in
which the detection procedure probably is tailored to fit the signal
and the listening task. If the listener is able to impose a linguistic
organization upon the sounds, he may use the information that is
temporally dispersed to arrive at a decision about a given sound
element. If such an association is not made, the decision tends
to be made more upon the acoustic factors of the moment and in comparison
to whatever standard is available. (Speech Analysis, p. 306)
It would seem that - regardless of the native language of the listener
- if the message possibilities were known and anticipated, then
intelligibility would be better. This would be particularly true
if the message was specifically chosen to utilize the most common
phonemes that are readily recognizable and distinguishable in most
languages (perhaps even only those languages that are present in
a particular ethnic cross section of a region where a public building
is constructed).
For example, if at a public assembly a warning signal were sounded
to alert those in attendance that important instructions were to
follow, and if the possible message choices were limited to perhaps
three that had been made familiar to the audience earlier, then
the chances that all would comprehend the message would go up dramatically.
This simple system could readily be employed and maintained consistently
over a given geographical area, however a national solution is certainly
more practical.
Additional methods that can be utilized to improve intelligibility
involve improving the signal to noise ratio. The best way to do
this consistently is to increase the signal level without distortion.
This could be implemented as an adaptive filtering scheme that adjusts
the equalization of the system real time to be optimized for the
given message.
This optimization would involve the elimination of most of the low
and high frequency acoustic output while focusing on the vocal range.
The level over this band could be increased and if appropriate devices
had been chosen, the system could reproduce the message quite a
bit louder than the level at which it is run for normal playback.
Conclusions
It is apparent that this phoneme phenomenon deserves additional
attention, particularly in light of the ever-increasing ethnic diversity
that we experience in these United States. The problem is understood,
but its extent is not readily quantifiable. It would seem as if
it possesses the potential to be devastating.
Additional speech intelligibility research must be done with diverse
subjects, and methods of counteracting the effects of non-native
speaker speech intelligibility degradation should be developed and
employed to give further guarantee that the safety of the general
public is the principal goal of a successful sound reinforcement
system.
Jeff Rocha is director of loudspeaker design for EAW.
References
Beranek, Leo L.: Acoustics. Acoustical Society of America by
American Institute of Physics, 1986.
Flanagan, James L.: Speech Analysis Synthesis and Perception. 2nd
Edition. Springer- Verlag, Berlin, Heidelberg, New York, 1972.
Fry, Dennis.: Homo Loquens: Man as a talking animal. Cambridge University
Press: Cambridge, London, New York, Melbourne, 1977.
Lass, Norman J., Ed.: Contemporary Issues in Experimental Phonetics.
Academic Press: New York, San Francisco, London, 1976.
Lathi, B.P.: Modern Digital and Analog Communication Systems. 2nd
Edition. Holt, Rinehart and Winston, Inc. Philadelphia, Fort Worth,
Chicago, San Francisco, Montreal, Toronto, London, Sydney, Tokyo,
1989.
Olson, Harry F.: Acoustical Engineering. Professional Audio Journals,
Inc.: Philadelphia, Pennsylvania, 1991
|