Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

Hsin-Tien Chiang; Hao Zhang; Yong Xu; Meng Yu; Dong Yu

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

Hsin-Tien Chiang, Hao Zhang, Yong Xu, Meng Yu, Dong Yu

TL;DR

This work proposes a novel approach called Restorative SE (RestSE), which combines a lightweight SE module with a generative codec module to progressively enhance and restore speech quality.

Abstract

In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance. To overcome these limitations, we propose a novel approach called Restorative SE (RestSE), which combines a lightweight SE module with a generative codec module to progressively enhance and restore speech quality. The SE module initially reduces noise, while the codec module subsequently performs dereverberation and restores speech using generative capabilities. We systematically explore various quantization techniques within the codec module to optimize performance. Additionally, we introduce a weighted loss function and feature fusion that merges the SE output with the original mixture, particularly at segments where the SE output is heavily distorted. Experimental results demonstrate the effectiveness of our proposed method in enhancing speech quality under adverse conditions. Audio demos are available at: https://sophie091524.github.io/RestorativeSE/.

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

TL;DR

This work proposes a novel approach called Restorative SE (RestSE), which combines a lightweight SE module with a generative codec module to progressively enhance and restore speech quality.

Abstract

Paper Structure (18 sections, 7 equations, 3 figures, 3 tables)

This paper contains 18 sections, 7 equations, 3 figures, 3 tables.

Introduction
Proposed Approach
DN stage
DR & RST stage
Encoder and decoder
Quantization
Training objective
Enhancing RST with weighted loss and feature fusion
Weighted loss
Feature fusion layer integration
Experimental setup
Datasets and settings
Evaluation metrics
Results
Effectiveness of progressive learning pipeline
...and 3 more sections

Figures (3)

Figure 1: The proposed RestSE framework, illustrating the progressive pipeline with two sequential stages: DN and DR&RST.
Figure 2: Trend bar plot of OVRL scores across various SNR levels.
Figure 3: Comparison of Spectrograms. The red-boxed areas highlight RestSE's ability to restore regions over-suppressed by the LSTM.

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

TL;DR

Abstract

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

Authors

TL;DR

Abstract

Table of Contents

Figures (3)