Section 3: Statistical Measures of Speech Intelligibility
Statistical intelligibility measurements use human beings, rather than electronic test instruments, to assess speech communication systems.
First proposed in 1910 and refined with the introduction of the telephone and the advent of electronic communication systems in World War II, such tests are still considered to be the most accurate and reliable measures of intelligibility.
While many variations are in use, this discussion deals most directly with the American National Standards Institute’s approved procedure (ANSI S3.2-1989, “Method for Measuring the Intelligibility of Speech Over Communication Systems”).
Method and Applications
The statistical measurement process uses trained, English-fluent talkers speaking standardized word lists through the communication system to trained, English-fluent listeners. The word lists are crafted to evaluate specific aspects of speech transmission; the ability of the listeners to identify individual words or word pairs indicates the quality of the transmission.
Such tests are used in a wide variety of applications, from examining the acoustics of conference rooms to evaluating intercoms for deep-sea divers. In professional sound reinforcement, statistical tests provide crucial information for architects and consultants, both in designing speech reinforcement systems and refining their performance in the field. They may also be used to evaluate the contributions that specific microphones, loudspeakers and signal processors make to speech intelligibility.
In order for the results of any intelligibility test to be valid, those conducting the test must be well versed in experimental design and statistical data analysis. Since human subjects are central to the tests, the experimenters must also understand the psychological factors involved, including the effects of motivation and learning through repetition. Finally, they must, of course, know how to operate the sound system properly so as to avoid introducing errors. For all of these reasons, intelligibility tests invariably are made by trained consultants who specialize in the field.
The tests use a minimum of five talkers and five listeners; larger subject groups reduce the margin of error. Talkers and listeners are selected to assure a representative cross-section of age and gender.
All must speak English as their first language and have normal hearing. Talkers must have good articulation, and are trained both to speak at a consistent level and to synchronize their words with timing signals so that the rate of presentation doesn’t skew the test results in any way. Listeners must have good discrimination, and are familiarized with all the test words that will be used, the sound of each talker’s voice and the method of recording responses.
A number of specialized word lists are in common use for testing various aspects of speech communication. The ANSI standard specifies three:
—The Modified Rhyme Test
—The Diagnostic Rhyme Test
—The set of twenty Phonetically Balanced Word Lists
Other examples of word lists include:
—The Diagnostic Alliteration Test
—The Diagnostic Medial Consonant Test
—The Spelling Alphabet Test
If at all possible, the sound system should be tested under conditions of actual use: if there are potential sources of masking noise such as outside traffic or an HVAC system, these should be present during the testing and documented for the report.
It’s also important that the system gains be set to a representative sound pressure level. Pre-recorded test material can be used as long as the recording and playback equipment don’t introduce significant noise or distortion.
At a minimum, each talker is given three PB or MRT word lists - or the complete DRT list - to read. Where only one sound system is being tested, the trained subjects are first tested face-to-face or in similarly ideal conditions to establish a “control” or baseline measurement. (Under these circumstances the intelligibility should be nearly perfect.)
This score is then used as a reference to which the system under test can be compared. During testing, supplementary information such as the speed/certainty of the listeners’ responses and their statistical opinions about the sound system should be gathered.
Analyzing the Results
There are many ways of analyzing the test data depending on the characteristics of the particular word list and the variables being tested. At the least, a set of percentage scores is calculated showing the number of times words were identified correctly by each listener. Taking an average of these can produce a single overall score. If either the DRT or MRT is used, the results are adjusted mathematically to account for guessing (no adjustment is required for the PB test). Deeper statistical analyses can yield more detailed information about the sound system if undertaken carefully.
Go To: Section 1 Section 2 Section 3 Section 4 Section 5