BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

Zihan Zhang; Jiayao Sun; Xianjun Xia; Chuanzeng Huang; Yijian Xiao; Lei Xie

BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

TL;DR

BS-PLCNet addresses packet loss in VoIP by splitting the signal into wide-band and high-band paths, processing them with a large GCRN and a lightweight GRU, respectively, under a multi-task learning framework that includes $f_0$ prediction and linguistic awareness. The model is augmented with multiple discriminators (MPD, MFD, MetricGAN+) to enhance perceptual quality, and the loss combines PLCPA, MAE, $f_0$, Whisper-based linguistic cues, and GAN terms. Experimental results on realistic loss patterns and diverse datasets show that the proposed auxiliary tasks and discriminators improve MOS, WER, and overall score, with the full BS-PLCNet achieving a top ranking in ICASSP 2024 PLC Challenge. The approach yields a compact model with strong ASR compatibility and practical runtime, making it viable for real-time PLC in VoIP systems.

Abstract

Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems. To deal with the problem, we propose a band-split packet loss concealment network (BS-PLCNet). Specifically, we split the full-band signal into wide-band (0-8kHz) and high-band (8-24kHz). The wide-band signals are processed by a gated convolutional recurrent network (GCRN), while the high-band counterpart is processed by a simple GRU network. To ensure high speech quality and automatic speech recognition (ASR) compatibility, multi-task learning (MTL) framework including fundamental frequency (f0) prediction, linguistic awareness, and multi-discriminators are used. The proposed approach tied for 1st place in the ICASSP 2024 PLC Challenge.

BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

TL;DR

prediction and linguistic awareness. The model is augmented with multiple discriminators (MPD, MFD, MetricGAN+) to enhance perceptual quality, and the loss combines PLCPA, MAE,

, Whisper-based linguistic cues, and GAN terms. Experimental results on realistic loss patterns and diverse datasets show that the proposed auxiliary tasks and discriminators improve MOS, WER, and overall score, with the full BS-PLCNet achieving a top ranking in ICASSP 2024 PLC Challenge. The approach yields a compact model with strong ASR compatibility and practical runtime, making it viable for real-time PLC in VoIP systems.

Abstract

Paper Structure (8 sections, 2 equations, 1 figure, 2 tables)

This paper contains 8 sections, 2 equations, 1 figure, 2 tables.

Introduction
Proposed method
BS-PLCNet generator
Multi-discriminators
Loss function
EXPERIMENTS
Datasets and experiments setup
Results and conclusions

Figures (1)

Figure 1: The proposed BS-PLCNet (a), the structure of the TFDCM module (b) and the structure of the f0 prediction module (c).

BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

TL;DR

Abstract

BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

Authors

TL;DR

Abstract

Table of Contents

Figures (1)