Table of Contents
Fetching ...

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

TL;DR

This study proposes a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed and can increase PESQ score by 0.3 compared to the previous metricGAN and achieve state-of-the-art results.

Abstract

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

TL;DR

This study proposes a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed and can increase PESQ score by 0.3 compared to the previous metricGAN and achieve state-of-the-art results.

Abstract

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).

Paper Structure

This paper contains 13 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Training flow of MetricGAN.
  • Figure 2: Sigmoid function with different$\alpha$.
  • Figure 3: Learned values of$\alpha$ in learnable sigmoid function.
  • Figure 4: Learning curves of different settings (structure of$G$ is fixed).