Table of Contents
Fetching ...

Residual Quantization with Implicit Neural Codebooks

Iris A. M. Huijben, Matthijs Douze, Matthew Muckley, Ruud J. G. van Sloun, Jakob Verbeek

TL;DR

QINCo tackles the limitation of fixed per-step codebooks in residual quantization by learning step-specific, data-dependent codebooks via neural networks conditioned on the current partial reconstruction. It seamlessly integrates with inverted-file indices and approximate decoding to enable high-accuracy, large-scale vector search with shorter codes (e.g., 12-byte codes outperforming 16-byte UNQ on BigANN1M and Deep1M). The method preserves a standard RQ-like encoding/decoding path while enabling multi-rate coding and robust performance across diverse datasets, as demonstrated through extensive experiments and ablations. These findings suggest substantial practical impact for embedding compression and high-recall retrieval in large-scale systems, with promising directions for applying implicit codebooks to other MCQ schemes and faster encoding strategies.

Abstract

Vector quantization is a fundamental operation for data compression and vector search. To obtain high accuracy, multi-codebook methods represent each vector using codewords across several codebooks. Residual quantization (RQ) is one such method, which iteratively quantizes the error of the previous step. While the error distribution is dependent on previously-selected codewords, this dependency is not accounted for in conventional RQ as it uses a fixed codebook per quantization step. In this paper, we propose QINCo, a neural RQ variant that constructs specialized codebooks per step that depend on the approximation of the vector from previous steps. Experiments show that QINCo outperforms state-of-the-art methods by a large margin on several datasets and code sizes. For example, QINCo achieves better nearest-neighbor search accuracy using 12-byte codes than the state-of-the-art UNQ using 16 bytes on the BigANN1M and Deep1M datasets.

Residual Quantization with Implicit Neural Codebooks

TL;DR

QINCo tackles the limitation of fixed per-step codebooks in residual quantization by learning step-specific, data-dependent codebooks via neural networks conditioned on the current partial reconstruction. It seamlessly integrates with inverted-file indices and approximate decoding to enable high-accuracy, large-scale vector search with shorter codes (e.g., 12-byte codes outperforming 16-byte UNQ on BigANN1M and Deep1M). The method preserves a standard RQ-like encoding/decoding path while enabling multi-rate coding and robust performance across diverse datasets, as demonstrated through extensive experiments and ablations. These findings suggest substantial practical impact for embedding compression and high-recall retrieval in large-scale systems, with promising directions for applying implicit codebooks to other MCQ schemes and faster encoding strategies.

Abstract

Vector quantization is a fundamental operation for data compression and vector search. To obtain high accuracy, multi-codebook methods represent each vector using codewords across several codebooks. Residual quantization (RQ) is one such method, which iteratively quantizes the error of the previous step. While the error distribution is dependent on previously-selected codewords, this dependency is not accounted for in conventional RQ as it uses a fixed codebook per quantization step. In this paper, we propose QINCo, a neural RQ variant that constructs specialized codebooks per step that depend on the approximation of the vector from previous steps. Experiments show that QINCo outperforms state-of-the-art methods by a large margin on several datasets and code sizes. For example, QINCo achieves better nearest-neighbor search accuracy using 12-byte codes than the state-of-the-art UNQ using 16 bytes on the BigANN1M and Deep1M datasets.
Paper Structure (26 sections, 4 equations, 12 figures, 10 tables)

This paper contains 26 sections, 4 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Top: Given a vector ${\bm{x}}$, RQ iteratively quantizes the residuals of previous quantization steps, using a single codebook ${\bm{C}}^m$ for each step $m=1,\dots,M$. QINCo extends RQ by using data-dependent codebooks that are implicitly parameterized via a neural network $f_{\theta_m}$ that takes as input a base-codebook $\bar{{\bm{C}}}^m$ and partial reconstruction $\hat{{\bm{x}}}^m$ of the data vector ${\bm{x}}$. Bottom: Toy data example with $M\!=\!2$ quantization steps, each with $K\!=\!2$ centroids. In RQ, codebook centroids in the $2^{\text{nd}}$ level are independent of the $1^{\text{st}}$ level centroids, while QINCo adapts $2^{\text{nd}}$ level centroids to the residuals, reducing the mean-squared-error (MSE) by 35%.
  • Figure 2: MSE (mean $\pm$ std. dev.) on BigANN1M across 16 quantization steps before training of QINCo ($L\!=\!16$), and after training on 10M samples.
  • Figure 3: Speed-accuracy trade-off in terms of queries per second (QPS) and recall@1 for IVF-QINCo, on BigANN1B ($10^9$ vectors), compared to IVF-PQ and IVF-RQ.
  • Figure 4: Performance of QINCo of residual blocks $L$ and a training set size $T$ of 500k (open) or 10M (solid).
  • Figure 5: The MSE after the $m^\text{th}$ quantization step is very similar for the 8 bytes and 16 bytes models for BigANN1M.
  • ...and 7 more figures