Table of Contents
Fetching ...

QINCODEC: Neural Audio Compression with Implicit Neural Codebooks

Zineb Lahrichi, Gaëtan Hadjeres, Gael Richard, Geoffroy Peeters

TL;DR

QinCodec addresses the bottleneck of differentiable quantization in neural audio codecs by decoupling autoencoder training from quantizer learning and applying offline quantization with implicit neural codebooks. The method proceeds in three steps: pretrain a continuous autoencoder, quantize latent codes offline with a residual vector quantizer (Qinco2), and optionally finetune the decoder to mitigate quantization artifacts. It achieves competitive performance at 16 kbps and remains effective at 8 kbps while significantly reducing training complexity, thanks to the ability to reuse off-the-shelf quantizers. The work presents a general framework for modular neural audio coding that amortizes autoencoder pretraining and enables flexible codec design for generative modeling tasks.

Abstract

Neural audio codecs, neural networks which compress a waveform into discrete tokens, play a crucial role in the recent development of audio generative models. State-of-the-art codecs rely on the end-to-end training of an autoencoder and a quantization bottleneck. However, this approach restricts the choice of the quantization methods as it requires to define how gradients propagate through the quantizer and how to update the quantization parameters online. In this work, we revisit the common practice of joint training and propose to quantize the latent representations of a pre-trained autoencoder offline, followed by an optional finetuning of the decoder to mitigate degradation from quantization. This strategy allows to consider any off-the-shelf quantizer, especially state-of-the-art trainable quantizers with implicit neural codebooks such as QINCO2. We demonstrate that with the latter, our proposed codec termed QINCODEC, is competitive with baseline codecs while being notably simpler to train. Finally, our approach provides a general framework that amortizes the cost of autoencoder pretraining, and enables more flexible codec design.

QINCODEC: Neural Audio Compression with Implicit Neural Codebooks

TL;DR

QinCodec addresses the bottleneck of differentiable quantization in neural audio codecs by decoupling autoencoder training from quantizer learning and applying offline quantization with implicit neural codebooks. The method proceeds in three steps: pretrain a continuous autoencoder, quantize latent codes offline with a residual vector quantizer (Qinco2), and optionally finetune the decoder to mitigate quantization artifacts. It achieves competitive performance at 16 kbps and remains effective at 8 kbps while significantly reducing training complexity, thanks to the ability to reuse off-the-shelf quantizers. The work presents a general framework for modular neural audio coding that amortizes autoencoder pretraining and enables flexible codec design for generative modeling tasks.

Abstract

Neural audio codecs, neural networks which compress a waveform into discrete tokens, play a crucial role in the recent development of audio generative models. State-of-the-art codecs rely on the end-to-end training of an autoencoder and a quantization bottleneck. However, this approach restricts the choice of the quantization methods as it requires to define how gradients propagate through the quantizer and how to update the quantization parameters online. In this work, we revisit the common practice of joint training and propose to quantize the latent representations of a pre-trained autoencoder offline, followed by an optional finetuning of the decoder to mitigate degradation from quantization. This strategy allows to consider any off-the-shelf quantizer, especially state-of-the-art trainable quantizers with implicit neural codebooks such as QINCO2. We demonstrate that with the latter, our proposed codec termed QINCODEC, is competitive with baseline codecs while being notably simpler to train. Finally, our approach provides a general framework that amortizes the cost of autoencoder pretraining, and enables more flexible codec design.

Paper Structure

This paper contains 23 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Training procedure of QinCodec with offline quantization: First, we train a continuous compression model with spectral and adversarial losses. Next, we quantize the bottleneck latent vectors into discrete embeddings. We then finetune the decoder on the quantized representations.
  • Figure 2: MUSHRA scores with 95% confidence intervals for DACand QinCodec and a 3.5kHz low-pass anchor, evaluated at 8 kbps and 16 kbps.
  • Figure 3: Perplexity vs. number of codebooks for EnCodec, DAC, QinCodec, and QinCodec with iRVQ quantization.
  • Figure 4: Performance of QinCodec at various bitrates, and latent dimensions.