Audio Signal Paths
The primary audio paths for a smart speaker are between the device and the IVA or a network server, using the internet with a Wi-Fi or wired connection. On the input side, a speech signal containing a spoken command is sensed with the device’s microphone array, digitized and uploaded to the IVA for signal processing and command interpretation. On the output side, digital audio content is transmitted from a web server to the device, where it is converted from digital to analog, then finally to an acoustic signal as it is played over the device’s loudspeaker system.
In addition to the two primary paths above, smart speakers may have several other audio paths, including:
• An analog output jack for connecting to an external powered speaker system.
• An analog input jack for using the smart speaker as a simple powered speaker.
• Bluetooth connection for playing audio content on an external Bluetooth speaker, streaming content from a smartphone or tablet as a music source, or in some cases acting as a hands-free device for telephone calls.
• Network connections to other smart speakers for multi-room music, stereo pairing or intercom functionality.
• Connections to home automation devices, e.g., for two-way intercom connection to a security device, or audible status messages.
The audio subsystems of smart speakers have a multitude of components that contribute to overall performance and audio quality, including microphones and microphone arrays, A/D and D/A converters, power amplifiers, loudspeaker drivers, digital signal processors, audio codecs, etc.
In addition, several system level functions such as beamforming, echo cancellation, wake word recognition, etc. contribute to overall quality. At some stage, each of these components and systems must be tested. Testing end-to-end performance of an overall smart speaker system is also desirable.
Different test contexts – R&D, validation, production test, quality assurance – have different goals and different levels of access to subsystems and components. For example, during product design, R&D engineers might well be able to isolate the active crossover functionality of a system on a chip (SOC) by physically tapping into chip level connections (and have the first-hand product knowledge to be able to use the resulting signals).
Similarly, for production test, manufacturers have the option of temporarily loading special test-specific firmware into the device to enable functional tests which are not available in off-the-shelf units. For example, noise reduction could be disabled allowing the microphone input system to be tested with sinusoidal signals instead of speech.
Testing the overall end-to-end performance of a smart speaker’s primary input and output audio paths can be quite challenging for the following reasons:
- Input to, and output from, a smart speaker is both acoustic, and acoustic test is by its nature more complex than electronic (analog or digital) audio test. Acoustic tests require calibrated microphones, usually an anechoic test chamber, and a quality loudspeaker system to stimulate DUT microphones.
- Smart speakers are inherently open-loop devices. On the input side, a signal (typically speech) is captured, digitized and transmitted to a server somewhere as a digital audio file. To assess the input path performance, the audio file must be retrieved from the server and analyzed in comparison to the signal that was generated in the first place. On the output side, audio content which originates as an audio file on a server is streamed to the device where it is converted to analog and played on the device’s loudspeaker system. To assess the output path performance, the device’s loudspeaker output must be measured with a measurement microphone and compared with the original signal from the server. The original signal is often in the form of an encoded audio signal (e.g., MP3 or AAC), which requires that it be decoded before analysis.
- The A/D and D/A converters in the device will invariably have different sample rates than the audio analyzer, requiring some form of compensation during analysis.[6}
Measuring Frequency Response
The most important aspect of the performance of any audio device is its frequency response. Frequency response is a type of “transfer function” measurement. For a device under test (DUT), it represents the magnitude and phase of the output from the DUT per unit input, as a function of frequency. Devices are often compared in terms of the “shape” of their frequency response curves, which typically refers to the magnitude response only (not phase), and in addition normalizes the magnitude to a reference value.
For example, the response magnitude might be normalized to its value at some reference frequency, say 1 kHz, such that the normalized curve passes through 0 dB at 1 kHz. Usually, a flat frequency response (constant response magnitude versus frequency) is desirable in audio systems to ensure that source material is faithfully recorded and reproduced without spectral coloration. Flat frequency response is quite achievable in electronic audio systems, but much more difficult in acoustic devices, especially loudspeakers.