Table of Contents
Fetching ...

GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds

Ayako Yamamoto, Toshio Irino, Fuki Miyazaki, Honoka Tamaru

TL;DR

GESI introduces a Gammachirp Envelope Similarity Index as an intrusive OIM to predict SI for HI listeners using NH listeners, built on frame-based GCFB processing, an IIR modulation filterbank, and a modified cosine similarity with a tunable level-difference parameter ρ. It adds an SSI-based frame-wise weight to separate vocal-tract from glottal contributions, enabling improved sensitivity to HL profiles expressed in audiograms and cochlear dysfunction. Across lab and remote experiments with male and female speech, and with HL simulations via WHIS, GESI outperforms STOI, ESTOI, MBSTOI, HASPIv1, and HASPIv2 in predicting average and individual SI without relying on extensive training data, and can adapt to listening conditions through ρ and tone-pip derived estimates. These results support using GESI as a practical front-end for SE algorithm evaluation in assistive devices and highlight its potential for crowdsourced validation and per-listener SI prediction without heavy training data.

Abstract

We propose an objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), which can predict the speech intelligibility (SI) of simulated hearing loss (HL) sounds for normal hearing (NH) listeners. GESI is an intrusive method that computes the SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. The unique features of GESI are that i) it reflects the hearing impaired (HI) listener's HL that appears in the audiogram and is caused by active and passive cochlear dysfunction, ii) it provides a single goodness metric, as in the widely used STOI and ESTOI, that can be used immediately to evaluate SE algorithms, and iii) it provides a simple control parameter to accept the level asymmetry of the reference and test sounds and to deal with individual listening conditions and environments. We evaluated GESI and the conventional OIMs, STOI, ESTOI, MBSTOI, and HASPI versions 1 and 2 by using four SI experiments on words of male and female speech sounds in both laboratory and remote environments. GESI was shown to outperform the other OIMs in the evaluations. GESI could be used to improve SE algorithms in assistive listening devices for individual HI listeners.

GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds

TL;DR

GESI introduces a Gammachirp Envelope Similarity Index as an intrusive OIM to predict SI for HI listeners using NH listeners, built on frame-based GCFB processing, an IIR modulation filterbank, and a modified cosine similarity with a tunable level-difference parameter ρ. It adds an SSI-based frame-wise weight to separate vocal-tract from glottal contributions, enabling improved sensitivity to HL profiles expressed in audiograms and cochlear dysfunction. Across lab and remote experiments with male and female speech, and with HL simulations via WHIS, GESI outperforms STOI, ESTOI, MBSTOI, HASPIv1, and HASPIv2 in predicting average and individual SI without relying on extensive training data, and can adapt to listening conditions through ρ and tone-pip derived estimates. These results support using GESI as a practical front-end for SE algorithm evaluation in assistive devices and highlight its potential for crowdsourced validation and per-listener SI prediction without heavy training data.

Abstract

We propose an objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), which can predict the speech intelligibility (SI) of simulated hearing loss (HL) sounds for normal hearing (NH) listeners. GESI is an intrusive method that computes the SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. The unique features of GESI are that i) it reflects the hearing impaired (HI) listener's HL that appears in the audiogram and is caused by active and passive cochlear dysfunction, ii) it provides a single goodness metric, as in the widely used STOI and ESTOI, that can be used immediately to evaluate SE algorithms, and iii) it provides a simple control parameter to accept the level asymmetry of the reference and test sounds and to deal with individual listening conditions and environments. We evaluated GESI and the conventional OIMs, STOI, ESTOI, MBSTOI, and HASPI versions 1 and 2 by using four SI experiments on words of male and female speech sounds in both laboratory and remote environments. GESI was shown to outperform the other OIMs in the evaluations. GESI could be used to improve SE algorithms in assistive listening devices for individual HI listeners.
Paper Structure (59 sections, 7 equations, 7 figures, 6 tables)

This paper contains 59 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Block diagram of GESI
  • Figure 2: RMS digital level of a sequence of 15 tone pips decreasing in steps of 5 dB. The right y-axis shows the SPL when the first pip is assumed to be 64ḋB SPL.
  • Figure 3: Subjective SI results: Mean and standard deviation (SD) of word correct rate (%) across listeners.
  • Figure 4: Scatter plot for the mean reported number of audible tone pips, $\bar{N}_{pip}$, versus the mean SRT value (dB) for the male and female speech experiments. The conditions were unprocessed (black), low-level (green), 70-yr (blue), and 80-yr (red). The experiments of the panels correspond to those in Fig. \ref{['fig:RsltSbj']}. Each point represents an individual listener. The solid lines are the regression results.
  • Figure 5: SI prediction results in Eval.1. For comparison, human subjective results on male speech experiments for laboratory (a) and remote (b), which are exactly the same as Figs. \ref{['fig:RsltSbj']}(a) and \ref{['fig:RsltSbj']}(b), are reproduced here. SI predictions by STOI (c), GESI ($\rho=0.55$) (d), GESI ($\rho=0.60$) (e), ESTOI (f), HASPIv1 (g), HASPIv2 (h), MBSTOI (i). The mean value and standard deviation (SD) across the participants and words.
  • ...and 2 more figures