Remixing Music for Hearing Aids Using Ensemble of Fine-Tuned Source Separators
Matthew Daly
TL;DR
This work addresses remixing music for hearing-aid users by enabling listener-specific gains for four source components (VDBO) guided by audiograms. It adopts an ensemble of MSS models (HDemucs, KUIELab-MDX-Net, and DTTNet) that are fine-tuned on Cadenza data to handle $HRTFs$-induced crosstalk, followed by residual-based refinement and dynamic-range compression, and then $NAL\text{-}R$ amplification. The proposed system achieves the top $HAAQI$ score on the evaluation data, with ablations showing that incorporating the residual signal and applying compression significantly improve quality. This demonstrates that fine-tuned ensembles and targeted post-processing can substantially enhance hearing-aid music listening, offering practical gains for real-world hearing devices.
Abstract
This paper introduces our system submission for the Cadenza ICASSP 2024 Grand Challenge, which presents the problem of remixing and enhancing music for hearing aid users. Our system placed first in the challenge, achieving the best average Hearing-Aid Audio Quality Index (HAAQI) score on the evaluation data set. We describe the system, which uses an ensemble of deep learning music source separators that are fine tuned on the challenge data. We demonstrate the effectiveness of our system through the challenge results and analyze the importance of different system aspects through ablation studies.
