Table of Contents
Fetching ...

Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization

Takashi Morita

TL;DR

This work tackles the non-differentiability of vector quantization by reframing it as a smoothing problem on the simplex and introducing a simple, geometry-driven regularizer. The KNN-based loss enforces both tight smoothing (onehot-like behavior) and full codebook utilization, avoiding code collapse across tasks. Empirical results on ImageNet-based discrete autoencoding and Wav2Vec 2.0 pretraining demonstrate robust codebook usage and competitive performance compared with traditional STE, Gumbel-softmax, and perplexity-based methods. The approach offers a scalable, task-agnostic path to reliable smoothed vector quantization in neural networks.

Abstract

Vector quantization, which discretizes a continuous vector space into a finite set of representative vectors (a codebook), has been widely adopted in modern machine learning. Despite its effectiveness, vector quantization poses a fundamental challenge: the non-differentiable quantization step blocks gradient backpropagation. Smoothed vector quantization addresses this issue by relaxing the hard assignment of a codebook vector into a weighted combination of codebook entries, represented as the matrix product of a simplex vector and the codebook. Effective smoothing requires two properties: (1) smoothed quantizers should remain close to a onehot vector, ensuring tight approximation, and (2) all codebook entries should be utilized, preventing code collapse. Existing methods typically address these desiderata separately. By contrast, the present study introduces a simple and intuitive regularization that promotes both simultaneously by minimizing the distance between each simplex vertex and its $K$-nearest smoothed quantizers. Experiments on representative benchmarks, including discrete image autoencoding and contrastive speech representation learning, demonstrate that the proposed method achieves more reliable codebook utilization and improves performance compared to prior approaches.

Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization

TL;DR

This work tackles the non-differentiability of vector quantization by reframing it as a smoothing problem on the simplex and introducing a simple, geometry-driven regularizer. The KNN-based loss enforces both tight smoothing (onehot-like behavior) and full codebook utilization, avoiding code collapse across tasks. Empirical results on ImageNet-based discrete autoencoding and Wav2Vec 2.0 pretraining demonstrate robust codebook usage and competitive performance compared with traditional STE, Gumbel-softmax, and perplexity-based methods. The approach offers a scalable, task-agnostic path to reliable smoothed vector quantization in neural networks.

Abstract

Vector quantization, which discretizes a continuous vector space into a finite set of representative vectors (a codebook), has been widely adopted in modern machine learning. Despite its effectiveness, vector quantization poses a fundamental challenge: the non-differentiable quantization step blocks gradient backpropagation. Smoothed vector quantization addresses this issue by relaxing the hard assignment of a codebook vector into a weighted combination of codebook entries, represented as the matrix product of a simplex vector and the codebook. Effective smoothing requires two properties: (1) smoothed quantizers should remain close to a onehot vector, ensuring tight approximation, and (2) all codebook entries should be utilized, preventing code collapse. Existing methods typically address these desiderata separately. By contrast, the present study introduces a simple and intuitive regularization that promotes both simultaneously by minimizing the distance between each simplex vertex and its -nearest smoothed quantizers. Experiments on representative benchmarks, including discrete image autoencoding and contrastive speech representation learning, demonstrate that the proposed method achieves more reliable codebook utilization and improves performance compared to prior approaches.

Paper Structure

This paper contains 19 sections, 13 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: (\ref{['fig:simplex']}) Four different distributions on the simplex $\Delta^{3-1}$. For effective smoothed vector quantization, samples should be concentrated near the vertices of the simplex (i.e., onehot-like vectors; orange), rather than centered (dark gray) or uniformly spread across the simplex (light gray). At the same time, each vertex must be neighbored by some samples to avoid code collapse (blue). (\ref{['fig:entropy-of-mean']}) Maximizing the perplexity of the sample mean Baevski+20_wav2vec2.0 penalizes code collapse but cannot discriminate among the other three distributions. (\ref{['fig:knnl2']}) The proposed $K$-nearest neighbor (KNN) distance minimization ($K=8$) favors the desired vertex-concentrated distribution while also preventing code collapse.
  • Figure 2: Dirichlet distributions on the simplex $\Delta^{3-1}$ with concentration parameters $\alpha_1=\alpha_2=\alpha_3=\alpha$, where $\alpha \in \{0.5, 1.0, 2.0\}$.