Table of Contents
Fetching ...

Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation

Michele Buccoli, Yu Du, Jacob Soendergaard, Simone Shawn Cazzaniga

TL;DR

This paper addresses how automotive microphone frequency response characteristics and driving noise affect both human speech quality and automatic speech recognition (ASR) performance. The authors implement a controlled experimental framework by simulating automotive front-end signals as $x(n) = f( s(n) * h(n) + v(n))$, using three car types, three noise types, and a cascade of FR filters to generate 113 microphone shapes. Speech quality is evaluated with ETSI TS 103 281 metrics (S-MOS, N-MOS) and listening effort, and WER is assessed with the Whisper tiny model; results show noise type dominates degradation, while bandwidth and FR peaks have limited impact on MOS, though low-end HP2 effects and elevated resonance above 10 kHz influence MOS. The findings offer practical guidance for microphone specification in automotive hands-free and ASR contexts and establish a framework for future work including more vehicle models, diverse driving scenarios, and integration with acoustic front ends.

Abstract

Upon choosing microphones for automotive hands-free communication or Automatic Speech Recognition (ASR) applications, OEMs typically specify wideband, super wideband or even fullband requirements following established standard recommendations (e.g., ITU-P.1110, ITU-P.1120). In practice, it is often challenging to achieve the preferred bandwidth for an automotive microphone when considering limitations and constraints on microphone placement inside the cabin, and the automotive grade environmental robustness requirements. On the other hand, there seems to be no consensus or sufficient data on the effect of each microphone characteristic on the actual performance. As an attempt to answer this question, we used noise signals recorded in real vehicles and under various driving conditions to experimentally study the relationship between the microphones' characteristics and the final audio quality of speech communication and performance of ASR engines. We focus on how variations in microphone bandwidth and amplitude frequency response shapes affect the perceptual speech quality. The speech quality results are compared by using ETSI TS 103 281 metrics (S-MOS, N-MOS, G-MOS) and ancillary metrics such as SNR. The ASR results are evaluated with standard metrics such as Word Error Rate (WER). Findings from this study provide knowledge in the understanding of what microphone frequency response characteristics are more relevant for audio quality and choice of proper microphone specifications, particularly for automotive applications.

Effects of automotive microphone frequency response characteristics and noise conditions on speech and ASR quality -- an experimental evaluation

TL;DR

This paper addresses how automotive microphone frequency response characteristics and driving noise affect both human speech quality and automatic speech recognition (ASR) performance. The authors implement a controlled experimental framework by simulating automotive front-end signals as , using three car types, three noise types, and a cascade of FR filters to generate 113 microphone shapes. Speech quality is evaluated with ETSI TS 103 281 metrics (S-MOS, N-MOS) and listening effort, and WER is assessed with the Whisper tiny model; results show noise type dominates degradation, while bandwidth and FR peaks have limited impact on MOS, though low-end HP2 effects and elevated resonance above 10 kHz influence MOS. The findings offer practical guidance for microphone specification in automotive hands-free and ASR contexts and establish a framework for future work including more vehicle models, diverse driving scenarios, and integration with acoustic front ends.

Abstract

Upon choosing microphones for automotive hands-free communication or Automatic Speech Recognition (ASR) applications, OEMs typically specify wideband, super wideband or even fullband requirements following established standard recommendations (e.g., ITU-P.1110, ITU-P.1120). In practice, it is often challenging to achieve the preferred bandwidth for an automotive microphone when considering limitations and constraints on microphone placement inside the cabin, and the automotive grade environmental robustness requirements. On the other hand, there seems to be no consensus or sufficient data on the effect of each microphone characteristic on the actual performance. As an attempt to answer this question, we used noise signals recorded in real vehicles and under various driving conditions to experimentally study the relationship between the microphones' characteristics and the final audio quality of speech communication and performance of ASR engines. We focus on how variations in microphone bandwidth and amplitude frequency response shapes affect the perceptual speech quality. The speech quality results are compared by using ETSI TS 103 281 metrics (S-MOS, N-MOS, G-MOS) and ancillary metrics such as SNR. The ASR results are evaluated with standard metrics such as Word Error Rate (WER). Findings from this study provide knowledge in the understanding of what microphone frequency response characteristics are more relevant for audio quality and choice of proper microphone specifications, particularly for automotive applications.

Paper Structure

This paper contains 11 sections, 1 equation, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Representation of microphone’s FR characteristics emulated in this study.
  • Figure 2: S-MOS (a) and N-MOS (b) values separated by noise and car types.
  • Figure 3: S-MOS (a) and N-MOS (b) values separated by car and noise type.
  • Figure 4: A-weighted SNR as a function of noise (hue) and car type. Legend is the same of Fig. \ref{['fig:SNMOSall_huenoise']}
  • Figure 5: S-MOS values versus (a) high and (b) low cut-off frequencies.
  • ...and 6 more figures