Table of Contents
Fetching ...

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada, Shoji Makino

TL;DR

This paper targets single-channel speech enhancement by challenging the necessity of precise phase reconstruction. It introduces unrestricted global phase bias (UPB) and applies it to a Conformer-based Metric GAN (CMGAN), replacing the traditional phase-focused losses with UPB-oriented objectives and components. The authors propose a UPB-aware loss, a magnitude-weighted UPB loss, a UPB-aware discriminator, and UPB-based data augmentation, all designed to leverage the perceptual insensitivity to global phase shifts. On VoiceBank-DEMAND, the UPB-CMGAN achieves state-of-the-art results without additional computational cost, with ablations confirming the contribution of each UPB component.

Abstract

With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

TL;DR

This paper targets single-channel speech enhancement by challenging the necessity of precise phase reconstruction. It introduces unrestricted global phase bias (UPB) and applies it to a Conformer-based Metric GAN (CMGAN), replacing the traditional phase-focused losses with UPB-oriented objectives and components. The authors propose a UPB-aware loss, a magnitude-weighted UPB loss, a UPB-aware discriminator, and UPB-based data augmentation, all designed to leverage the perceptual insensitivity to global phase shifts. On VoiceBank-DEMAND, the UPB-CMGAN achieves state-of-the-art results without additional computational cost, with ablations confirming the contribution of each UPB component.

Abstract

With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.
Paper Structure (15 sections, 17 equations, 2 figures, 3 tables)

This paper contains 15 sections, 17 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Three different types of speech enhancement models
  • Figure 2: Comparison of original phase reconstructed signal and biased phase reconstructed signal (partial)