GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds
Ayako Yamamoto, Toshio Irino, Fuki Miyazaki, Honoka Tamaru
TL;DR
GESI introduces a Gammachirp Envelope Similarity Index as an intrusive OIM to predict SI for HI listeners using NH listeners, built on frame-based GCFB processing, an IIR modulation filterbank, and a modified cosine similarity with a tunable level-difference parameter ρ. It adds an SSI-based frame-wise weight to separate vocal-tract from glottal contributions, enabling improved sensitivity to HL profiles expressed in audiograms and cochlear dysfunction. Across lab and remote experiments with male and female speech, and with HL simulations via WHIS, GESI outperforms STOI, ESTOI, MBSTOI, HASPIv1, and HASPIv2 in predicting average and individual SI without relying on extensive training data, and can adapt to listening conditions through ρ and tone-pip derived estimates. These results support using GESI as a practical front-end for SE algorithm evaluation in assistive devices and highlight its potential for crowdsourced validation and per-listener SI prediction without heavy training data.
Abstract
We propose an objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), which can predict the speech intelligibility (SI) of simulated hearing loss (HL) sounds for normal hearing (NH) listeners. GESI is an intrusive method that computes the SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. The unique features of GESI are that i) it reflects the hearing impaired (HI) listener's HL that appears in the audiogram and is caused by active and passive cochlear dysfunction, ii) it provides a single goodness metric, as in the widely used STOI and ESTOI, that can be used immediately to evaluate SE algorithms, and iii) it provides a simple control parameter to accept the level asymmetry of the reference and test sounds and to deal with individual listening conditions and environments. We evaluated GESI and the conventional OIMs, STOI, ESTOI, MBSTOI, and HASPI versions 1 and 2 by using four SI experiments on words of male and female speech sounds in both laboratory and remote environments. GESI was shown to outperform the other OIMs in the evaluations. GESI could be used to improve SE algorithms in assistive listening devices for individual HI listeners.
