Table of Contents
Fetching ...

RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

TL;DR

RaD-Net tackles speech quality degradation in communication systems by integrating a COM-Net–based repairing module with a denoising network, augmented by multi-resolution and multi-band discriminators. A three-step training strategy enables stable optimization, balancing front-end repair and back-end denoising under perceptual and adversarial criteria. The approach yields clear improvements in objective metrics and DNSMOS, and achieves top-tier rankings (2nd in track 1, 3rd in track 2) on the ICASSP 2024 SSI Challenge, while maintaining practical real-time factors. This work demonstrates that swapping the repairing component to COM-Net and employing targeted discriminative training can substantially enhance speech signal quality under diverse distortions.

Abstract

This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training stage. Finally, we use a three-step training strategy to optimize our model. We submit two models with different sets of parameters to meet the RTF requirement of the two tracks. According to the official results, the proposed systems rank 2nd in track 1 and 3rd in track 2.

RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

TL;DR

RaD-Net tackles speech quality degradation in communication systems by integrating a COM-Net–based repairing module with a denoising network, augmented by multi-resolution and multi-band discriminators. A three-step training strategy enables stable optimization, balancing front-end repair and back-end denoising under perceptual and adversarial criteria. The approach yields clear improvements in objective metrics and DNSMOS, and achieves top-tier rankings (2nd in track 1, 3rd in track 2) on the ICASSP 2024 SSI Challenge, while maintaining practical real-time factors. This work demonstrates that swapping the repairing component to COM-Net and employing targeted discriminative training can substantially enhance speech signal quality under diverse distortions.

Abstract

This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training stage. Finally, we use a three-step training strategy to optimize our model. We submit two models with different sets of parameters to meet the RTF requirement of the two tracks. According to the official results, the proposed systems rank 2nd in track 1 and 3rd in track 2.
Paper Structure (11 sections, 1 equation, 1 figure, 2 tables)

This paper contains 11 sections, 1 equation, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Architecture of RaD-Net.