Section 4: Machine Measures of Speech Intelligibility
Statistical tests using trained talkers and listeners are by far the most accurate and reliable methods for intelligibility testing. Unfortunately, they are complicated to set up, time-consuming to conduct and require extensive statistical analysis to interpret.
Hence, consultants and acousticians have long sought an automated, machine-based test that could quickly and easily yield meaningful intelligibility scores for speech systems. A number of methods have emerged over the past fifty-odd years that fall into two basic categories: analyses of the reverberant field, and measurements based on signal-to-noise ratio.
From at least the ancient Classical period, architects have recognized that reverberation and echoes hamper intelligibility. Indeed, that realization resulted in the development of the Greek amphitheater, a durable architectural model that survives to this day.
Modern acousticians have at their disposal several different methods to test reverberation in enclosed spaces. The most commonly used of these are:
—%ALcons - a measure that’s familiar to many sound system engineers
—Useful-to-Detrimental Sound Ratios
—Early-to-Late Sound Energy Ratio
Each of these tests can tell us something about the reverberant qualities of a space and, therefore, how intelligible speech could be in that space. Since they deal predominantly with reverberation, however, they fail to take into account the majority of the factors that can affect a speech reinforcement system’s performance.
With the advent of electronic communication systems and their complex potential problems, acousticians and engineers recognized that different machine testing approaches were needed.
Beginning as early as the 1940’s with telephony research at Bell Laboratories, several instrument-based tests have evolved, each of which relies on signal-to-noise measurements in one form or another. They are:
—AI - Articulation Index
—STI - Speech Transmission Index
—RASTI (another measure that’s familiar to some sound system engineers)
—SII - Speech Intelligibility Index
AI is now of interest chiefly for having demonstrated the relative importance of different frequency bands in the speech spectrum; because it doesn’t effectively account for reverberation, it has been largely superseded by the newer methods. Of these, only RASTI is available in a simple, reasonably-priced instrument.
SII (which is proposed as ANSI standard S3.5-1997) is the most robust of the machine intelligibility measures, but it requires sophisticated equipment and the calculations that it entails are quite complex. Given the prodigious computing power that’s now available at reasonable cost, however, a practical, affordable SII instrument could soon become a reality.
Limitations of Machine Measures
Their relative convenience notwithstanding, all machine-based intelligibility measures have inherent limitations.
Every machine testing method requires that the operator have significant experience and analytical skill if the results are to be accurate and useful. It can be very difficult to identify inaccurate or misleading scores and determine their causes. Most significantly, adjustments to the system that improve intelligibility may not positively affect the measured score - and adjustments that improve the measurements may not enhance intelligibility.
In addition to these factors, each testing method has its own particular limitations that must be weighed both when carrying out the tests and when interpreting the results.
Percentage Articulation Loss of Consonants. This machine measure of intelligibility is closely associated with the TEF sound analyzer. It is computed from measurements of the Direct-to-Reverberant Ratio and the Early Decay Time using a set of correlations defined by SynAudCon, and is specified in percent.
Since %ALcons expresses loss of consonant definition, lower values are associated with greater intelligibility. It is generally assumed that the maximum allowable value for typical paging applications is 10%, assuming that the environment is relatively free of masking noise. For learning environments and voice warning systems, the desired value is 5% or less.
The %Alcons method is widely used by acoustical consultants (particularly in the United States), but it has significant drawbacks. First, it is based on measurements in a single one-third octave band centered on 2 kHz; all other frequencies are ignored, so the system’s frequency response must be verified in some other way for the %Alcons score to be meaningful.
Moreover, the method does not account for many factors that can dramatically affect intelligibility, including signal-to-noise ratio, the background noise spectrum, distortion, late reflections or echoes, system frequency response, compression, non-linear phase, equalization and acoustic power. %Alcons measurements of sound systems therefore often yield overly optimistic scores. Where reverberation or strong, late-arriving reflections are the primary problem, however, they can sometimes be more useful and accurate than RASTI.
The ratio between the intensities of the direct sound and reverberation. There are several measures for this quantity. C50, one of the most popular, expresses speech clarity as the energy ratio of the first 50 milliseconds of direct sound to the overall steady-state reverberation, with 0 dB being the minimum acceptable value and +4 dB or above preferred.
A similar measure, C7, is used in Germany; C35 is yet another version. Measurements are made in a single frequency band (usually centered on 1 kHz). Each of these measures can be more reliable and repeatable than %ALcons, which also deals with the direct-to-reverberant ratio.
Useful-to-Detrimental Sound Ratios
The logarithmic ratio between the energy of sounds that are useful to intelligibility and those that are detrimental to it, expressed in decibels.
“Useful” sounds are the integrated energy of speech sounds arriving within the first 50 or 80 milliseconds after the direct sound, and “detrimental” sounds are the sum of later-arriving speech energy and ambient noise. In practice, both quantities may be found by integrating appropriate portions of the room impulse response.
Early-to-Late Sound Energy Ratio
Proposed in 1996 by G. Marshall, ELR is similar to C50 but is weighted for speech and incorporates measurements in more than one frequency band. As with other direct-to-reverberant methods, however, factors other than reverberation are not accounted for.
One of the earliest attempts to measure by machine the intelligibility of a speech transmission system, the Articulation Index was developed by Bell Telephone Laboratories in the 1940’s.
AI is based on the idea that the response of a speech communication system can be divided into twenty frequency bands, each of which carries an independent contribution to the intelligibility of the system, and that the total contribution of all the bands is the sum of the contributions of the individual bands. (AI may also be measured using one-third octave or octave bands.) Signal-to-noise ratios are computed for each individual band, then weighted and combined to yield an intelligibility score.
The AI varies in value from 0 (completely unintelligible) to 1 (perfect intelligibility). An AI of 0.3 or below is considered unsatisfactory, 0.3 to 0.5 satisfactory, 0.5 to 0.7 good, and greater than 0.7 very good to excellent.
Developed in the early 1970s, the Speech Transmission Index (STI) is an machine measure of intelligibility whose value varies from 0 (completely unintelligible) to 1 (perfect intelligibility).
In STI testing, speech is modeled by a special test signal with speech-like characteristics. Following on the concept that speech can be described as a fundamental waveform that is modulated by low-frequency signals, STI employs a complex amplitude modulation scheme to generate its test signal. At the receiving end of the communication system, the depth of modulation of the received signal is compared with that of the test signal in each of a number of frequency bands. Reductions in the modulation depth are associated with loss of intelligibility.
Rapid Speech Transmission Index, an machine method of testing for intelligibility in sound systems that is associated with Brüel and Kjaer, the instrumentation company that manufactures a portable device to implement it.
RASTI was developed as a simpler alternative to the more complex STI (Speech Transmission Index). In contrast to STI, RASTI measures only in two octave bands centered at 500 Hz and 2 kHz, respectively. It uses a speech-like excitation signal and, like STI, correlates reductions in modulation depth to loss of intelligibility.
RASTI has been implemented in a simple, portable instrument that can make very rapid intelligibility measurements, both acoustically and with an installed sound system. For this reason, it has been adopted for a number of European standards and civil system specifications. Being a radically simplified version of STI, however, it suffers compromises that have forced reevaluation of those standards.
For example, RASTI tests in only two frequency bands, with the assumption that the sound system’s response actually extends in a reasonably flat fashion from 100 Hz or lower to 8 kHz or higher. While this might well be the case in a properly-designed auditorium system, many types of paging systems fall short of such performance. In these cases, RASTI almost invariably gives an overly optimistic picture. (In fact, a sound system that reproduced only the two frequency bands in question could receive a perfect rating.)
Moreover, because it affects modulation depth, any compression or limiting in the system can cause an artificially low RASTI value - despite the fact that it may, in actuality, be acting to enhance intelligibility. RASTI also does not take system distortion or non-linear amplitude and phase into account.
Derived from and in essence identical to STI, SII is the method for by machine measuring speech intelligibility that is currently proposed in draft form as ANSI Standard S3.5-1997.
In the Standard, four measurement procedures are allowed, each using a different number and size of frequency bands. In descending order of accuracy, they are:
—Critical band (21 bands)
—One-third octave band (18 bands)
—Equally-contributing critical band (17 bands)
—Octave band (6 bands)
The value of SII varies from 0 (completely unintelligible) to 1 (perfect intelligibility).
SII is a highly capable testing method that, under the right conditions, shows good correlation with statistical tests. It features both wide bandwidth (150 Hz to 8.5 kHz) and, especially in the critical band procedure, far greater resolution than any other method. SII properly includes reverberation, noise and distortion, all of which are accounted for in the modulation transfer function. Experienced test operators can go beyond generating a single intelligibility score to diagnosing the source of a loss in intelligibility.
Under certain conditions, however, SII can yield misleading results. In particular, late-arriving reflections and echoes can distort the measurement significantly. Like RASTI, SII is susceptible to giving artificially low intelligibility scores if compression or limiting is introduced in the system. And because even the critical-band procedure ignores frequencies below 100 Hz, it may very well miss significant low-frequency masking sources.
Finally, SII does not take non-linear phase into account. Nonetheless, when used correctly by a skilled operator, it remains the most reliable and accurate of the machine methods.
Go To: Section 1 Section 2 Section 3 Section 4 Section 5