Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

Ryandhimas E. Zezario; Fei Chen; Chiou-Shann Fuh; Hsin-Min Wang; Yu Tsao

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

TL;DR

This paper tackles non-intrusive prediction of speech intelligibility for hearing aids by introducing two Whisper-based improvements to MBI-Net. MBI-Net+ uses Whisper embeddings for richer cross-domain features, while MBI-Net++ adds a multi-task framework that jointly predicts intelligibility and HASPI with a loss $O = \alpha \cdot \mathcal{L}_{Int} + \beta \cdot \mathcal{L}_{HASPI}$. Experiments on the CPC 2023 Clarity dataset show that Whisper-based features and auxiliary HASPI supervision yield superior performance, with MBI-Net++ achieving the best non-intrusive results and ranking highly in the challenge. These findings highlight the value of cross-domain representations and auxiliary metrics for robust hearing-aid intelligibility assessment in real-world scenarios.

Abstract

Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper's embeddings to create cross-domain acoustic features and includes metadata from speech signals by using a classifier that distinguishes different enhancement methods. Furthermore, MBI-Net+ integrates the hearing-aid speech perception index (HASPI) as a supplementary metric into the objective function to further boost prediction performance. Experimental results demonstrate that MBI-Net+ surpasses several intrusive baseline systems and MBI-Net on the Clarity Prediction Challenge 2023 dataset, validating the effectiveness of incorporating Whisper embeddings, speech metadata, and related complementary metrics to improve prediction performance for HA.

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

TL;DR

. Experiments on the CPC 2023 Clarity dataset show that Whisper-based features and auxiliary HASPI supervision yield superior performance, with MBI-Net++ achieving the best non-intrusive results and ranking highly in the challenge. These findings highlight the value of cross-domain representations and auxiliary metrics for robust hearing-aid intelligibility assessment in real-world scenarios.

Abstract

Paper Structure (8 sections, 6 equations, 2 figures, 3 tables)

This paper contains 8 sections, 6 equations, 2 figures, 3 tables.

Introduction
Improved MBI-Net
Experiments
Experimental Setup
Comparing Improved MBI-Net with Original MBI-Net
Effect of the Hearing Loss Model
Comparison with Other Models
Conclusion

Figures (2)

Figure 1: Architecture of the MBI-Net++ model.
Figure 2: Illustration of extracting cross-domain features and estimating frame-level intelligibility scores using the CNN-BLSTM+AT architecture.

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

TL;DR

Abstract

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

Authors

TL;DR

Abstract

Table of Contents

Figures (2)