Table of Contents
Fetching ...

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network

Yanan Chen, Zihao Cui, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang

TL;DR

A novel weighting prediction approach is presented, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement and introduces a novel speech enhancement network, the Plugin Speech Enhancement.

Abstract

The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement frameworks regarding the expected speech in downstream modules. These limitations impede the effectiveness of static speech enhancement approaches in achieving optimal performance for a range of speech processing tasks, thereby challenging the notion of universal applicability. The fundamental issue in achieving universal speech enhancement lies in effectively informing the speech enhancement module about the features of downstream modules. In this study, we present a novel weighting prediction approach, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement. We found the role of deciding whether to employ data augmentation techniques as crucial downstream training information. This decision significantly impacts the expected speech and the performance of the speech enhancement module. Moreover, we introduce a novel speech enhancement network, the Plugin Speech Enhancement (Plugin-SE). The Plugin-SE is a dynamic neural network that includes the speech enhancement module, gate module, and weight prediction module. Experimental results demonstrate that the proposed Plugin-SE approach is competitive or superior to other joint training methods across various downstream tasks.

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network

TL;DR

A novel weighting prediction approach is presented, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement and introduces a novel speech enhancement network, the Plugin Speech Enhancement.

Abstract

The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement frameworks regarding the expected speech in downstream modules. These limitations impede the effectiveness of static speech enhancement approaches in achieving optimal performance for a range of speech processing tasks, thereby challenging the notion of universal applicability. The fundamental issue in achieving universal speech enhancement lies in effectively informing the speech enhancement module about the features of downstream modules. In this study, we present a novel weighting prediction approach, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement. We found the role of deciding whether to employ data augmentation techniques as crucial downstream training information. This decision significantly impacts the expected speech and the performance of the speech enhancement module. Moreover, we introduce a novel speech enhancement network, the Plugin Speech Enhancement (Plugin-SE). The Plugin-SE is a dynamic neural network that includes the speech enhancement module, gate module, and weight prediction module. Experimental results demonstrate that the proposed Plugin-SE approach is competitive or superior to other joint training methods across various downstream tasks.
Paper Structure (13 sections, 11 equations, 6 figures, 5 tables)

This paper contains 13 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The inference stage of static speech enhancement and proposed plugin speech enhancement. (a) the static speech enhancement, (b) the plugin speech enhancement, as a kind of dynamic speech enhancement.
  • Figure 2: The inference stage of the plugin speech enhancement for single task. The gate parameter is determined before the speech processing.
  • Figure 3: The relationship between output-to-input ratio and SV scores.
  • Figure 4: The SV scores for methods on different test datasets. (vox1_O_cleaned is used for (a) and (d), vox1_E_cleaned is used for (b) and (e), and vox1_H_cleaned is used for (c) and (f).)
  • Figure 5: The Hubert scores for methods on noisy speech.
  • ...and 1 more figures