Did You Say Rock Or Wock? Native Language & Speech Intelligibility Problems

But these variations are typically less language dependent and are fewer in number than the differing phonemes.

“So much emphasis has been placed on the phoneme level of operation because this is where the main ear-work of speech takes place. [intonation and rhythm, while important to comprehension, involve a significantly smaller number of categories]…the English system, for example, functions with six tones and only two rhythmic categories, formed by the strong syllables and the weaker ones.” (Homo Loquens, page 72)

All of this is to say nothing of the tremendous differences in sentence construction between various languages that can add to or detract from one’s ability to achieve comprehension from context.

Simple things like adjectives preceding or following nouns can severely obstruct ones ability to gather meaning.

In essence, there are several logical explanations that describe the perceived inability of non-native speakers to comprehend a familiar language, particularly when spoken in a noisy environment.

Speech Intelligibility Derivations
The goal in developing a good speech transmission system is to determine what conditions are necessary for the maximum intelligibility. This intelligibility “… is used to signify the accuracy and ease with which the articulated sounds of speech are recognized.” (Olson, page 495)

The criteria used to determine the effectiveness of this speech transmission system are intelligibility indices that are based upon signal and noise levels over specified bandwidths. The fundamental methods used to determine intelligibility involve “… pronouncing speech sounds into one end of a transmission system and having the observer write the sounds that are heard at the receiving end.” (Olson, page 495)

“According to the work of French and Steinberg and of Beranek, if the spectrum levels of speech at a listener’s ear are such that the shaded region of lies above the threshold of hearing of the listener and above the ambient noise, but below the overload line, all syllables of the speech will be audible to the listener and the speech intelligibility will be nearly perfect. This corresponds to an articulation index of 100 percent….” The percentage articulation index is defined as the ratio (times 100) of the speech area not covered over by [noise, the threshold of hearing, or overload] to the total speech area… (Beranek pp. 408-409)

In order to calculate these quantities for a theoretical system, the gain of the system, coupled with the directivity index of the amplification system and the reverberant characteristics of the space can be used to determine the average, peak and minimum levels of speech and noise in a given space.

The problem here is that the tests that were used to arrive at these conclusions involved native speakers of English listening to native speakers of English. While Beranek and others recognize the significance of “psychological and linguistic” factors as they relate to different native speakers, different word lengths and trained or untrained listeners all of which yield dramatically different articulation results make “absolute predictions of articulation scores … not possible”. The contention remains that “one can say that if the calculated articulation index exceeds 60 per cent, a speech-communication system is probably satisfactory.” (Beranek, p. 415)

Much of the basis for the additional indices such as STI (a general purpose speech intelligibility index based upon SNR and reverberation), RASTI (similar to STI but requiring less data) and %ALCons (the percentage of consonants that will be detected clearly which is paramount to comprehension) has evolved from these early studies into speech intelligibility.

More recent social and technological changes require that additional steps be taken to ensure public safety. The original conclusions are all based upon the fallacy that the vast majority of speech takes place between native speakers of a common language. This assumes that resulting indices will suffice for all communication.

This was perhaps true at some point in the past. As technology expands and the world in essence shrinks, diverse language histories will frequently come in contact with one another. The simple experiments conducted by Professor Campbell indicate that non-native speaking students in one controlled environment correctly identified less than 40 percent of the words correctly. Clearly additional work must be done in this area.