Table of Contents
Fetching ...

Single Microphone Own Voice Detection based on Simulated Transfer Functions for Hearing Aids

Mathuranathan Mayuravaani, W. Bastiaan Kleijn, Andrew Lensen, Charlotte Sørensen

TL;DR

A data augmentation strategy based on simulated acoustic transfer functions (ATFs) that expose the model to a wide range of spatial propagation conditions that highlights the model's ability to generalize from simulated to real-world conditions, demonstrating practical viability and pointing toward a promising direction for future hearing aid design.

Abstract

This paper presents a simulation-based approach to own voice detection (OVD) in hearing aids using a single microphone. While OVD can significantly improve user comfort and speech intelligibility, existing solutions often rely on multiple microphones or additional sensors, increasing device complexity and cost. To enable ML-based OVD without requiring costly transfer-function measurements, we propose a data augmentation strategy based on simulated acoustic transfer functions (ATFs) that expose the model to a wide range of spatial propagation conditions. A transformer-based classifier is first trained on analytically generated ATFs and then progressively fine-tuned using numerically simulated ATFs, transitioning from a rigid-sphere model to a detailed head-and-torso representation. This hierarchical adaptation enabled the model to refine its spatial understanding while maintaining generalization. Experimental results show 95.52% accuracy on simulated head-and-torso test data. Under short-duration conditions, the model maintained 90.02% accuracy with one-second utterances. On real hearing aid recordings, the model achieved 80% accuracy without fine-tuning, aided by lightweight test-time feature compensation. This highlights the model's ability to generalize from simulated to real-world conditions, demonstrating practical viability and pointing toward a promising direction for future hearing aid design.

Single Microphone Own Voice Detection based on Simulated Transfer Functions for Hearing Aids

TL;DR

A data augmentation strategy based on simulated acoustic transfer functions (ATFs) that expose the model to a wide range of spatial propagation conditions that highlights the model's ability to generalize from simulated to real-world conditions, demonstrating practical viability and pointing toward a promising direction for future hearing aid design.

Abstract

This paper presents a simulation-based approach to own voice detection (OVD) in hearing aids using a single microphone. While OVD can significantly improve user comfort and speech intelligibility, existing solutions often rely on multiple microphones or additional sensors, increasing device complexity and cost. To enable ML-based OVD without requiring costly transfer-function measurements, we propose a data augmentation strategy based on simulated acoustic transfer functions (ATFs) that expose the model to a wide range of spatial propagation conditions. A transformer-based classifier is first trained on analytically generated ATFs and then progressively fine-tuned using numerically simulated ATFs, transitioning from a rigid-sphere model to a detailed head-and-torso representation. This hierarchical adaptation enabled the model to refine its spatial understanding while maintaining generalization. Experimental results show 95.52% accuracy on simulated head-and-torso test data. Under short-duration conditions, the model maintained 90.02% accuracy with one-second utterances. On real hearing aid recordings, the model achieved 80% accuracy without fine-tuning, aided by lightweight test-time feature compensation. This highlights the model's ability to generalize from simulated to real-world conditions, demonstrating practical viability and pointing toward a promising direction for future hearing aid design.
Paper Structure (23 sections, 12 equations, 11 figures, 7 tables)

This paper contains 23 sections, 12 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Geometry of point source and rigid sphere.
  • Figure 2: Mapping the pressure between (a) a point source and a point receiver, (b) a point source and a rigid sphere, (c) a rigid sphere source with vibrating cap and a point receiver, and (d) vibrating spherical cap and location on the rigid sphere. Here, S denotes the source location and M denotes the receiver (microphone) location.
  • Figure 3: Mapping the pressure at an angle on the sphere when (a) a speech signal is from a point source from a distance (external speaker) and (b) a speech signal is from a vibrating cap on the rigid sphere (own voice).
  • Figure 4: Geometry of oscillating cap in a rigid sphere.
  • Figure 5: Directivity pattern $20 \log_{10}( D(\theta)/D(0))$ of the near-field pressure for a straight-ahead point source scattering from a 7 cm rigid sphere and a spherical cap ($\alpha=20^\circ$).
  • ...and 6 more figures