LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

Ram C. M. C. Shekar; Iván López-Espejo

LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

Ram C. M. C. Shekar, Iván López-Espejo

TL;DR

LIWhiz tackles lyric intelligibility prediction for listeners, including those with hearing loss, using a non-intrusive approach. It builds a Whisper-based front-end to extract rich features from both the original and hearing-loss-simulated audio and a trainable back-end that fuses these representations through linear mixing, a Bi-LSTM, and a final sigmoid predictor. On the Cadenza CLIP dataset, LIWhiz outperforms both non-intrusive STOI and intrusive Whisper baselines, achieving RMSE around 27% and NCC around 0.65 on evaluation, with ablations showing the benefit of including the original audio. The results demonstrate a robust, non-intrusive method for lyric intelligibility prediction that could enable lyric enhancement and improved accessibility in music listening.

Abstract

We present LIWhiz, a non-intrusive lyric intelligibility prediction system submitted to the ICASSP 2026 Cadenza Challenge. LIWhiz leverages Whisper for robust feature extraction and a trainable back-end for score prediction. Tested on the Cadenza Lyric Intelligibility Prediction (CLIP) evaluation set, LIWhiz achieves a root mean square error (RMSE) of 27.07%, a 22.4% relative RMSE reduction over the STOI-based baseline, yielding a substantial improvement in normalized cross-correlation.

LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

TL;DR

Abstract

LIWhiz: A Non-Intrusive Lyric Intelligibility Prediction System for the Cadenza Challenge

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)