|
| 
Statistical measures of speech intelligibility
By John Roberts
Edited by Rachel Murray, P.E.
|
To read part 1 of this article, click
here.
To comment or question further on this article, contact techsupport@meyersound.com
Statistical intelligibility measurements use human beings, rather
than electronic test instruments, to assess speech communication
systems.
First proposed in 1910 and refined with the introduction of the
telephone and the advent of electronic communication systems in
World War II, such tests are still considered to be the most accurate
and reliable measures of intelligibility. While many variations
are in use, this discussion deals most directly with the American
National Standards Institute’s approved procedure (ANSI S3.2-1989,
“Method for Measuring the Intelligibility of Speech Over Communication
Systems”).
Method and Applications
The statistical measurement process uses trained, English-fluent
talkers speaking standardized word lists through the communication
system to trained, English-fluent listeners. The word lists are
crafted to evaluate specific aspects of speech transmission; the
ability of the listeners to identify individual words or word pairs
indicates the quality of the transmission.
Such tests are used in a wide variety of applications, from examining
the acoustics of conference rooms to evaluating intercoms for deep-sea
divers. In professional sound reinforcement, statistical tests provide
crucial information for architects and consultants, both in designing
speech reinforcement systems and refining their performance in the
field.
They may also be used to evaluate the contributions that specific
microphones, loudspeakers and signal processors make to speech intelligibility.
Preparation
In order for the results of any intelligibility test to be valid,
those conducting the test must be well versed in experimental design
and statistical data analysis. Since human subjects are central
to the tests, the experimenters must also understand the psychological
factors involved, including the effects of motivation and learning
through repetition.
Finally, they must, of course, know how to operate the sound system
properly so as to avoid introducing errors. For all of these reasons,
intelligibility tests invariably are made by trained consultants
who specialize in the field.
The tests use a minimum of five talkers and five listeners; larger
subject groups reduce the margin of error. Talkers and listeners
are selected to assure a representative cross-section of age and
gender. All must speak English as their first language and have
normal hearing.
Talkers must have good articulation, and are trained both to speak
at a consistent level and to synchronize their words with timing
signals so that the rate of presentation doesn’t skew the
test results in any way. Listeners must have good discrimination,
and are familiarized with all the test words that will be used,
the sound of each talker’s voice and the method of recording
responses.
A number of specialized word lists are in common use for testing
various aspects of speech communication. The ANSI standard specifies
three:
* The Modified Rhyme Test
* The Diagnostic Rhyme Test
* The set of twenty Phonetically Balanced Word Lists
Other examples of word lists include:
* The Diagnostic Alliteration Test
* The Diagnostic Medial Consonant Test
* The Spelling Alphabet Test
Editor’s Note: For more details on tests, click
here.
Testing
If at all possible, the sound system should be tested under conditions
of actual use: if there are potential sources of masking noise such
as outside traffic or an HVAC system, these should be present during
the testing and documented for the report.
It’s also important that the system gains be set to a representative
sound pressure level. Pre-recorded test material can be used as
long as the recording and playback equipment don’t introduce
significant noise or distortion.
At a minimum, each talker is given three PB or MRT word lists -
or the complete DRT list - to read. Where only one sound system
is being tested, the trained subjects are first tested face-to-face
or in similarly ideal conditions to establish a “control”
or baseline measurement. (Under these circumstances the intelligibility
should be nearly perfect.)
This score is then used as a reference to which the system under
test can be compared. During testing, supplementary information
such as the speed/certainty of the listeners’ responses and
their statistical opinions about the sound system should be gathered.
Analyzing the Results
There are many ways of analyzing the test data depending on the
characteristics of the particular word list and the variables being
tested. At the least, a set of percentage scores is calculated showing
the number of times words were identified correctly by each listener.
Taking an average of these can produce a single overall score. If
either the DRT or MRT is used, the results are adjusted mathematically
to account for guessing (no adjustment is required for the PB test).
Deeper statistical analyses can yield more detailed information
about the sound system if undertaken carefully.
Again, Meyer encourages your comments and question questions regarding
this article. Contact techsupport@meyersound.com
|