Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

Mahmoud Salhab; Haidar Harmanani

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

Mahmoud Salhab, Haidar Harmanani

TL;DR

The paper addresses speech bandwidth expansion by learning an end-to-end high-fidelity GAN that maps narrowband to wideband speech across multiple upsampling ratios $oldsymbol{s}>1$. It introduces a unified generator-discriminator architecture inspired by HiFi-GAN, utilizing a generator with transposed convolutions and multi-receptive-field blocks, along with MSD and MPD discriminators, trained with adversarial, mel-spectrogram reconstruction, and feature-matching losses. Key contributions include a single model capable of handling several upsampling ratios, zero-shot generalization to unseen ratios, and empirical improvements over end-to-end baselines while remaining competitive with cascaded NVSR methods, particularly at higher ratios, evaluated on the VCTK dataset using LSD as the metric. The results suggest practical benefits for real-world speech enhancement tasks, offering a simpler, scalable approach with robust performance across varying bandwidth expansion factors.

Abstract

Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

TL;DR

The paper addresses speech bandwidth expansion by learning an end-to-end high-fidelity GAN that maps narrowband to wideband speech across multiple upsampling ratios

. It introduces a unified generator-discriminator architecture inspired by HiFi-GAN, utilizing a generator with transposed convolutions and multi-receptive-field blocks, along with MSD and MPD discriminators, trained with adversarial, mel-spectrogram reconstruction, and feature-matching losses. Key contributions include a single model capable of handling several upsampling ratios, zero-shot generalization to unseen ratios, and empirical improvements over end-to-end baselines while remaining competitive with cascaded NVSR methods, particularly at higher ratios, evaluated on the VCTK dataset using LSD as the metric. The results suggest practical benefits for real-world speech enhancement tasks, offering a simpler, scalable approach with robust performance across varying bandwidth expansion factors.

Abstract

Paper Structure (13 sections, 9 equations, 3 figures, 1 table)

This paper contains 13 sections, 9 equations, 3 figures, 1 table.

Introduction
Related Work
Methodology
Model
Generator
Discriminator
Training Loss
Experiments
Dataset
Evaluation Metric
Experimental Setup
Results
Conclusion

Figures (3)

Figure 1: Complete architecture of the model.
Figure 2: Spectrogram Analysis of Narrowband to Wideband Speech Reconstruction with Varying Upsampling Ratios ($\mathbf{s}=8$, $\mathbf{s}=4$, $\mathbf{s}=2$)
Figure 3: Performance comparison of our unified model across various upsampling ratios, demonstrating its ability to handle unseen upsampling ratios with maintained low Log Spectral Distance (LSD) compared to traditional interpolation methods

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

TL;DR

Abstract

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)