On Improving Error Resilience of Neural End-to-End Speech Coders
Kishan Gupta, Nicola Pia, Srikanth Korse, Andreas Brendel, Guillaume Fuchs, Markus Multrus
TL;DR
Packet loss and jitter in UDP-based VoIP degrade speech quality. The paper extends the Neural End-to-End Speech Codec (NESC) with a low-complexity latent-domain packet loss concealment (PLC) and an in-band FEC that adds 0.8 kbps to improve resilience, achieving a total bitrate of up to 4 kbps. It introduces a distilled, 256-vector latent codebook to support PLC and FEC, and a causal convolutional PLC predictor operating on past distilled vectors for autoregressive concealment. Evaluations across objective metrics and a P.808 listening test show that PLC plus FEC provides robust performance under burst losses, approaching or matching higher-bitrate baselines with minimal added complexity. This work enables more practical deployment of error-resilient neural speech codecs in real networks and points to future extensions to other end-to-end models and neural FEC methods.
Abstract
Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates but few considerations were made about their error resilience in a real system. Recently introduced Neural End-to-End Speech Codec (NESC) can reproduce high quality natural speech at low bitrates. We extend its robustness to packet losses by adding a low complexity network to predict the codebook indices in latent space. Furthermore, we propose a method to add an in-band FEC at an additional bitrate of 0.8 kbps. Both subjective and objective assessment indicate the effectiveness of proposed methods, and demonstrate that coupling PLC and FEC provide significant robustness against packet losses.
