Table of Contents
Fetching ...

Modeling of Speech-dependent Own Voice Transfer Characteristics for Hearables with In-ear Microphones

Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

TL;DR

Experimental results show that the proposed speech-dependent model enables to simulate in-ear signals more accurately than a speech-independent model in terms of technical measures, especially under utterance mismatch and talker mismatch.

Abstract

Many hearables contain an in-ear microphone, which may be used to capture the own voice of its user. However, due to the hearable occluding the ear canal, the in-ear microphone mostly records body-conducted speech, typically suffering from band-limitation effects and amplification at low frequencies. Since the occlusion effect is determined by the ratio between the air-conducted and body-conducted components of own voice, the own voice transfer characteristics between the outer face of the hearable and the in-ear microphone depend on the speech content and the individual talker. In this paper, we propose a speech-dependent model of the own voice transfer characteristics based on phoneme recognition, assuming a linear time-invariant relative transfer function for each phoneme. We consider both individual models as well as models averaged over several talkers. Experimental results based on recordings with a prototype hearable show that the proposed speech-dependent model enables to simulate in-ear signals more accurately than a speech-independent model in terms of technical measures, especially under utterance mismatch and talker mismatch. Additionally, simulation results show that talker-averaged models generalize better to different talkers than individual models.

Modeling of Speech-dependent Own Voice Transfer Characteristics for Hearables with In-ear Microphones

TL;DR

Experimental results show that the proposed speech-dependent model enables to simulate in-ear signals more accurately than a speech-independent model in terms of technical measures, especially under utterance mismatch and talker mismatch.

Abstract

Many hearables contain an in-ear microphone, which may be used to capture the own voice of its user. However, due to the hearable occluding the ear canal, the in-ear microphone mostly records body-conducted speech, typically suffering from band-limitation effects and amplification at low frequencies. Since the occlusion effect is determined by the ratio between the air-conducted and body-conducted components of own voice, the own voice transfer characteristics between the outer face of the hearable and the in-ear microphone depend on the speech content and the individual talker. In this paper, we propose a speech-dependent model of the own voice transfer characteristics based on phoneme recognition, assuming a linear time-invariant relative transfer function for each phoneme. We consider both individual models as well as models averaged over several talkers. Experimental results based on recordings with a prototype hearable show that the proposed speech-dependent model enables to simulate in-ear signals more accurately than a speech-independent model in terms of technical measures, especially under utterance mismatch and talker mismatch. Additionally, simulation results show that talker-averaged models generalize better to different talkers than individual models.
Paper Structure (15 sections, 25 equations, 11 figures)

This paper contains 15 sections, 25 equations, 11 figures.

Figures (11)

  • Figure 1: The own voice signal model for a hearable with two microphone (outer face, in-ear).
  • Figure 2: Overview of the identification and simulation steps of the own voice transfer characteristic models.
  • Figure 3: Simulation of in-ear own voice signals for talker $b$ using the speech-independent model for talker $a$.
  • Figure 4: Simulation of in-ear own voice signals for talker $b$ using the proposed speech-dependent model for talker $a$.
  • Figure 5: The adaptive filtering scheme utilized for estimating in-ear speech signals. The filter coefficients are transferred from identification to simulation directly after each sample-wise adaptation step.
  • ...and 6 more figures