A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Jean-Marc Valin, Jan Skoglund
TL;DR
This work presents LPCNet, a real-time, ultra-low-bitrate neural vocoder achieving 1.6 kb/s by combining linear prediction with sparse recurrent networks. The architecture employs frame-rate conditioning and sample-rate networks, along with carefully designed quantization for pitch and cepstrum, and 40 ms packetization to maintain low latency. Through training with noise injection, domain adaptation, and data augmentation, the approach delivers quality surpassing MELP and comparable to higher-bitrate waveform codecs in unquantized form, demonstrating the practicality of neural synthesis for ultra-low bitrate speech coding. The results suggest significant potential for neural post-filtering and higher-bitrate exploration to further close the gap to uncompressed speech while preserving real-time deployment on mobile devices.
Abstract
Neural speech synthesis algorithms are a promising new approach for coding speech at very low bitrate. They have so far demonstrated quality that far exceeds traditional vocoders, at the cost of very high complexity. In this work, we present a low-bitrate neural vocoder based on the LPCNet model. The use of linear prediction and sparse recurrent networks makes it possible to achieve real-time operation on general-purpose hardware. We demonstrate that LPCNet operating at 1.6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate. This opens the way for new codec designs based on neural synthesis models.
