Table of Contents
Fetching ...

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

Dyah A. M. G. Wisnu, Stefano Rini, Ryandhimas E. Zezario, Hsin-Min Wang, Yu Tsao

TL;DR

HAAQI-Net tackles the challenge of non-intrusive, efficient music audio quality assessment for hearing aid users by leveraging BEATs-based features with a BLSTM-attention architecture. It integrates a knowledge-distillation framework to compress the feature extractor and achieve real-time inference while preserving accuracy. The model achieves high correlations with ground-truth scores (LCC ≈ 0.946, SRCC ≈ 0.960) and maintains strong robustness across hearing-loss patterns, signal-processing conditions, and genres, with MOS adaptation showing promising transfer to subjective perception. Its efficiency gains, along with the ability to adapt to subjective scores and withstand SPL variations, position HAAQI-Net as a scalable solution for practical hearing-aid audio quality assessment and related applications.

Abstract

This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI) that require intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. By utilizing a Bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features extracted from the pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. Experimental results demonstrate HAAQI-Net's effectiveness, achieving a Linear Correlation Coefficient (LCC) of 0.9368 , a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486 , and a Mean Squared Error (MSE) of 0.0064 and inference time significantly reduces from 62.52 to 2.54 seconds. To address computational overhead, a knowledge distillation strategy was applied, reducing parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071 , SRCC: 0.9307 , MSE: 0.0091 ). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores like the Mean Opinion Score (MOS) through fine-tuning. This adaptation significantly improved prediction accuracy, validated through statistical analysis. Furthermore, the robustness of HAAQI-Net was evaluated under varying Sound Pressure Level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a scalable solution for music audio quality assessment in hearing aid applications, contributing to efficient and accurate models in audio signal processing and hearing aid technology.

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids

TL;DR

HAAQI-Net tackles the challenge of non-intrusive, efficient music audio quality assessment for hearing aid users by leveraging BEATs-based features with a BLSTM-attention architecture. It integrates a knowledge-distillation framework to compress the feature extractor and achieve real-time inference while preserving accuracy. The model achieves high correlations with ground-truth scores (LCC ≈ 0.946, SRCC ≈ 0.960) and maintains strong robustness across hearing-loss patterns, signal-processing conditions, and genres, with MOS adaptation showing promising transfer to subjective perception. Its efficiency gains, along with the ability to adapt to subjective scores and withstand SPL variations, position HAAQI-Net as a scalable solution for practical hearing-aid audio quality assessment and related applications.

Abstract

This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI) that require intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. By utilizing a Bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features extracted from the pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. Experimental results demonstrate HAAQI-Net's effectiveness, achieving a Linear Correlation Coefficient (LCC) of 0.9368 , a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486 , and a Mean Squared Error (MSE) of 0.0064 and inference time significantly reduces from 62.52 to 2.54 seconds. To address computational overhead, a knowledge distillation strategy was applied, reducing parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071 , SRCC: 0.9307 , MSE: 0.0091 ). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores like the Mean Opinion Score (MOS) through fine-tuning. This adaptation significantly improved prediction accuracy, validated through statistical analysis. Furthermore, the robustness of HAAQI-Net was evaluated under varying Sound Pressure Level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a scalable solution for music audio quality assessment in hearing aid applications, contributing to efficient and accurate models in audio signal processing and hearing aid technology.
Paper Structure (27 sections, 12 equations, 11 figures, 6 tables)

This paper contains 27 sections, 12 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: The architecture of HAAQI-Net.
  • Figure 2: The architecture of HAAQI-Net with knowledge distillation.
  • Figure 3: Some examples of hearing loss audiograms: the $y$-axis represents the hearing threshold in dB, and the $x$-axis represents the frequency in Hz.
  • Figure 4: Distribution of HAAQI scores for the music samples used to evaluate HAAQI-Net.
  • Figure 5: Scatter plots of music quality prediction of HAAQI-Net using different input features. The dashed diagonal line represents the optimal prediction, while the red line represents the regression line with 95% confidence interval for the model predictions. Data points below the dashed diagonal line indicate that the model's predictions are lower than the true HAAQI scores, while data points above the line indicate that the model's predictions are higher.
  • ...and 6 more figures