Table of Contents
Fetching ...

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Mohammad Hassan Vali, Tom Bäckström, Arno Solin

TL;DR

Vector quantization in neural networks is non-differentiable due to hard nearest-neighbor assignments, hindering end-to-end training. The authors introduce DiVeQ, a differentiable surrogate using directional reparameterization that preserves hard forward quantization while enabling gradient flow, and SF-DiVeQ, which quantizes along line segments between codewords to improve codebook utilization. Across VQ-VAE and VQGAN tasks on multiple image datasets, DiVeQ and SF-DiVeQ consistently surpass prior approaches (e.g., STE, EMA, RT, ST-GS, NSVQ) in reconstruction quality and generation fidelity, without auxiliary losses or temperature schedules. SF-DiVeQ additionally avoids codebook misalignment and eliminates heuristic codebook replacement, acting as a robust, drop-in differentiable quantization option for compression and generative models. These methods offer practical gains for end-to-end training of discrete latent models and broaden the applicability of differentiable quantization techniques in deep learning.

Abstract

Vector quantization is common in deep models, yet its hard assignments block gradients and hinder end-to-end training. We propose DiVeQ, which treats quantization as adding an error vector that mimics the quantization distortion, keeping the forward pass hard while letting gradients flow. We also present a space-filling variant (SF-DiVeQ) that assigns to a curve constructed by the lines connecting codewords, resulting in less quantization error and full codebook usage. Both methods train end-to-end without requiring auxiliary losses or temperature schedules. On VQ-VAE compression and VQGAN generation across various data sets, they improve reconstruction and sample quality over alternative quantization approaches.

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

TL;DR

Vector quantization in neural networks is non-differentiable due to hard nearest-neighbor assignments, hindering end-to-end training. The authors introduce DiVeQ, a differentiable surrogate using directional reparameterization that preserves hard forward quantization while enabling gradient flow, and SF-DiVeQ, which quantizes along line segments between codewords to improve codebook utilization. Across VQ-VAE and VQGAN tasks on multiple image datasets, DiVeQ and SF-DiVeQ consistently surpass prior approaches (e.g., STE, EMA, RT, ST-GS, NSVQ) in reconstruction quality and generation fidelity, without auxiliary losses or temperature schedules. SF-DiVeQ additionally avoids codebook misalignment and eliminates heuristic codebook replacement, acting as a robust, drop-in differentiable quantization option for compression and generative models. These methods offer practical gains for end-to-end training of discrete latent models and broaden the applicability of differentiable quantization techniques in deep learning.

Abstract

Vector quantization is common in deep models, yet its hard assignments block gradients and hinder end-to-end training. We propose DiVeQ, which treats quantization as adding an error vector that mimics the quantization distortion, keeping the forward pass hard while letting gradients flow. We also present a space-filling variant (SF-DiVeQ) that assigns to a curve constructed by the lines connecting codewords, resulting in less quantization error and full codebook usage. Both methods train end-to-end without requiring auxiliary losses or temperature schedules. On VQ-VAE compression and VQGAN generation across various data sets, they improve reconstruction and sample quality over alternative quantization approaches.

Paper Structure

This paper contains 43 sections, 23 equations, 30 figures, 6 tables.

Figures (30)

  • Figure 1: We replace the non-differentiable VQ operation ($\hat{{\bm{z}}} = {\bm{c}}_{i^*} = \arg\min_{{\bm{c}}_j} \|{\bm{z}}-{\bm{c}}_j\|_2$) on the left with differentiable vector quantization (DiVeQ) on the right that lets the gradients flow.
  • Figure 2: Illustration of NSVQ quantization. Input ${\bm{z}}$ is mapped to a random point on the circle. The mapping overshoots the true quantization error with probability $\theta_2/360^{\circ}\approx 0.67$, leading to a higher distortion than the nearest-codeword assignment.
  • Figure 3: Impact of $\sigma^2$ in DiVeQ quantization accuracy. Each panel shows mappings of input ${\bm{z}}_i$ to its closest codeword ${\bm{c}}_{i^*}$ using our proposed DiVeQ (\ref{['eq:diveq']}) when sampling $1000$ random directional vectors ${\mathbf{v}}_d$ from $\mathcal{N}(\bm{0}, \sigma^2{\bm{I}})$. DiVeQ quantization accuracy increases when $\sigma^{2} \to 0$ (see \ref{['app:var_ablation']}).
  • Figure 4: Codebook misalignment: t-SNE plots of the learned codebook $\mathcal{C}_z$ ( red crosses) and latent $\mathcal{P}_z$ ( gray points) representations for different VQ methods in VQ-VAE compression. The figure shows the misalignment between $\mathcal{C}_z$ and $\mathcal{P}_z$ (discussed in \ref{['paragraph_misalignment']}) for different methods. The plots refer to the cases highlighted in \ref{['fig:misalign_metrics_plot']}. The numbers report distortion per bit$\downarrow$ (see \ref{['app:misalignment']}).
  • Figure 5: DiVeQ and SF-DiVeQ improve image reconstruction. Qualitative comparison of reconstructed images in VQ-VAE compression task for different VQ optimization methods with VQ bitrate of $11$ (or codebook size of $2^{11}=2048$). We report LPIPS$\downarrow$ values in the left-hand corners.
  • ...and 25 more figures