Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models

Jiwan Seo; Joonhyuk Kang

Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models

Jiwan Seo, Joonhyuk Kang

TL;DR

RAQ addresses the rigidity of fixed-rate VQ-based generative models by enabling dynamic, multi-rate codebooks without retraining. It introduces a Seq2Seq-based rate adaptation module that autoregressively generates adapted codebooks of size $\tilde{K}$ from an original codebook of size $K$, with cross-forcing to stabilize training. A competitive model-based alternative using differentiable $k$-means (DKM) and inverse functional DKM (IKM) offers a no-parameter fallback for rate adjustment. Experiments across CIFAR10, CelebA, and ImageNet show that a single RAQ-enabled model matches or surpasses fixed-rate baselines across rates, with favorable trade-offs in reconstruction quality and perceptual metrics. The work broadens the applicability of VQ-based models to real-time, bandwidth-mvarying scenarios by reducing the need for multiple, separately trained models.

Abstract

Learning discrete representations with vector quantization (VQ) has emerged as a powerful approach in various generative models. However, most VQ-based models rely on a single, fixed-rate codebook, requiring extensive retraining for new bitrates or efficiency requirements. We introduce Rate-Adaptive Quantization (RAQ), a multi-rate codebook adaptation framework for VQ-based generative models. RAQ applies a data-driven approach to generate variable-rate codebooks from a single baseline VQ model, enabling flexible tradeoffs between compression and reconstruction fidelity. Additionally, we provide a simple clustering-based procedure for pre-trained VQ models, offering an alternative when retraining is infeasible. Our experiments show that RAQ performs effectively across multiple rates, often outperforming conventional fixed-rate VQ baselines. By enabling a single system to seamlessly handle diverse bitrate requirements, RAQ extends the adaptability of VQ-based generative models and broadens their applicability to data compression, reconstruction, and generation tasks.

Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models

TL;DR

from an original codebook of size

, with cross-forcing to stabilize training. A competitive model-based alternative using differentiable

-means (DKM) and inverse functional DKM (IKM) offers a no-parameter fallback for rate adjustment. Experiments across CIFAR10, CelebA, and ImageNet show that a single RAQ-enabled model matches or surpasses fixed-rate baselines across rates, with favorable trade-offs in reconstruction quality and perceptual metrics. The work broadens the applicability of VQ-based models to real-time, bandwidth-mvarying scenarios by reducing the need for multiple, separately trained models.

Abstract

Paper Structure (48 sections, 8 equations, 6 figures, 10 tables, 1 algorithm)

This paper contains 48 sections, 8 equations, 6 figures, 10 tables, 1 algorithm.

Introduction
Background
Vector-Quantized Variational AutoEncoder
Sequence-to-Sequence Learning
Methods
Rate-Adaptive Quantization
Overview
Autoregressive Codebook Generation
Codebook Encoding
Codebook Decoding via Cross-Forcing
Training Procedure
Model-Based RAQ
Codebook Clustering
Reducing the Rate $(\widetilde{K} < K)$
Increasing the Rate $(\widetilde{K} > K)$
...and 33 more sections

Figures (6)

Figure 1: An overview of our RAQ framework applied to a baseline VQ-VAE architecture. During training, the rate adaptation module $G_{\psi}$ employs cross-forcing to generate adapted codebooks $\mathbf{\tilde{e}}$ of arbitrary sizes $\widetilde{K}$ from the original codebook $\mathbf{e}$. At inference, a user-specified $\widetilde{K}$ produces the corresponding adapted codebook for quantization.
Figure 2: Reconstruction Performance on (a) CIFAR-10 and (b) CelebA at various codebook sizes $\widetilde{K}$. Higher values are better for PSNR, SSIM, and codebook perplexity, while lower values are better for rFID. Black lines: individual VQ-VAE models for each codebook size $\widetilde{K}$. Colored lines: single RAQ-based model adapted from codebook size $K$ to $\widetilde{K}$. The shaded area indicates the 95.45% confidence interval based on 4 runs with different seeds.
Figure 3: Qualitative comparison on ImageNet ($256\times256$) at different compression rates. Top row: Fixed-rate VQ-VAEs trained separately at each rate. Middle row: A single VQ-VAE ($K=4096$) with randomly selected codebooks. Bottom row: Our RAQ with VQ-VAE ($K=512$) with adapting the codebook size.
Figure 4: Reconstruction performance at different rates (adapted codebook sizes) evaluated on CelebA ($64\times64$) test set. In the graph, the black VQ-VAE-2s razavi2019generating are separate models trained on each codebook size, while the RAQs are one model per line.
Figure 5: Reconstructed images for Kodak kodak dataset at different rates.
...and 1 more figures

Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models

TL;DR

Abstract

Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)