Table of Contents
Fetching ...

Harmonic Detection from Noisy Speech with Auditory Frame Gain for Intelligibility Enhancement

A. Queiroz, R. Coelho

TL;DR

This work tackles speech intelligibility degradation in noise by introducing HDAG, a Harmonic Detection with Auditory Gain framework. The method combines HF0 estimation via HHT‑Amp, FSFFE‑based harmonic adjustment, a third‑octave gammachirp filterbank, and frame‑wise gain to amplify harmonic components. Objective results using ESTOI, ASII$_{ST}$, and PESQ show HDAG outperforms baselines across diverse noises and SNRs, corroborated by a perceptual intelligibility study. The approach offers a principled way to leverage harmonic structure for intelligibility gains with practical relevance for noisy speech applications, while highlighting computational considerations for real‑time use.

Abstract

This paper introduces a novel (HDAG - Harmonic Detection for Auditory Gain) method for speech intelligibility enhancement in noisy scenarios. In the proposed scheme, a series of selective Gammachirp filters are adopted to emphasize the harmonic components of speech reducing the masking effects of acoustic noises. The fundamental frequency are estimated by the HHT-Amp technique. Harmonic patterns estimated with low accuracy are detected and adjusted according the FSFFE low/high pitch separation. The central frequencies of the filterbank are defined considering the third octave subbands which are best suited to cover the regions most relevant to intelligibility. Before signal reconstruction, the gammachirp filtered components are amplified by gain factors regulated by FSFFE classification. The proposed HDAG solution and three baseline techniques are examined considering six background noises with four signal-to-noise ratios. Three objective measures are adopted for the evaluation of speech intelligibility and quality. Several experiments are conducted to demonstrate that the proposed scheme achieves better speech intelligibility improvement when compared to the competing approaches. A perceptual listening test is further considered and corroborates with the objective results.

Harmonic Detection from Noisy Speech with Auditory Frame Gain for Intelligibility Enhancement

TL;DR

This work tackles speech intelligibility degradation in noise by introducing HDAG, a Harmonic Detection with Auditory Gain framework. The method combines HF0 estimation via HHT‑Amp, FSFFE‑based harmonic adjustment, a third‑octave gammachirp filterbank, and frame‑wise gain to amplify harmonic components. Objective results using ESTOI, ASII, and PESQ show HDAG outperforms baselines across diverse noises and SNRs, corroborated by a perceptual intelligibility study. The approach offers a principled way to leverage harmonic structure for intelligibility gains with practical relevance for noisy speech applications, while highlighting computational considerations for real‑time use.

Abstract

This paper introduces a novel (HDAG - Harmonic Detection for Auditory Gain) method for speech intelligibility enhancement in noisy scenarios. In the proposed scheme, a series of selective Gammachirp filters are adopted to emphasize the harmonic components of speech reducing the masking effects of acoustic noises. The fundamental frequency are estimated by the HHT-Amp technique. Harmonic patterns estimated with low accuracy are detected and adjusted according the FSFFE low/high pitch separation. The central frequencies of the filterbank are defined considering the third octave subbands which are best suited to cover the regions most relevant to intelligibility. Before signal reconstruction, the gammachirp filtered components are amplified by gain factors regulated by FSFFE classification. The proposed HDAG solution and three baseline techniques are examined considering six background noises with four signal-to-noise ratios. Three objective measures are adopted for the evaluation of speech intelligibility and quality. Several experiments are conducted to demonstrate that the proposed scheme achieves better speech intelligibility improvement when compared to the competing approaches. A perceptual listening test is further considered and corroborates with the objective results.
Paper Structure (16 sections, 20 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 20 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Block diagram of the proposed HDAG method for improve the intelligibility of noisy speech signals.
  • Figure 2: Block diagram of the FSFFE technique for low/high pitch classification of speech frames.
  • Figure 3: Ground Truth and F0 estimated with HHT-Amp technique for: (a) Clean Speech segment, (b) Noisy Signal with babble SNR=-5dB and (c) same Noisy segment with estimates improved by FSFFE.
  • Figure 4: ESTOI curves of (a) low pitch and (b) high pitch frames averaged for SNR values: -10dB, -5dB, 0dB and 5dB of Babble noise according the gain factor $G_k$ for each gammachirp filter.
  • Figure 5: $\Delta$ASII$_\text{ST}$ intelligibility enhancement [$\times$10$^{-2}$] averaged for speech signals corrupted by noises: (a) Babble, (b) Cafeteria, (c) Traffic, (d) Train, (e) Helicopter and (f) SSN.
  • ...and 1 more figures