Residual Quantization with Implicit Neural Codebooks
Iris A. M. Huijben, Matthijs Douze, Matthew Muckley, Ruud J. G. van Sloun, Jakob Verbeek
TL;DR
QINCo tackles the limitation of fixed per-step codebooks in residual quantization by learning step-specific, data-dependent codebooks via neural networks conditioned on the current partial reconstruction. It seamlessly integrates with inverted-file indices and approximate decoding to enable high-accuracy, large-scale vector search with shorter codes (e.g., 12-byte codes outperforming 16-byte UNQ on BigANN1M and Deep1M). The method preserves a standard RQ-like encoding/decoding path while enabling multi-rate coding and robust performance across diverse datasets, as demonstrated through extensive experiments and ablations. These findings suggest substantial practical impact for embedding compression and high-recall retrieval in large-scale systems, with promising directions for applying implicit codebooks to other MCQ schemes and faster encoding strategies.
Abstract
Vector quantization is a fundamental operation for data compression and vector search. To obtain high accuracy, multi-codebook methods represent each vector using codewords across several codebooks. Residual quantization (RQ) is one such method, which iteratively quantizes the error of the previous step. While the error distribution is dependent on previously-selected codewords, this dependency is not accounted for in conventional RQ as it uses a fixed codebook per quantization step. In this paper, we propose QINCo, a neural RQ variant that constructs specialized codebooks per step that depend on the approximation of the vector from previous steps. Experiments show that QINCo outperforms state-of-the-art methods by a large margin on several datasets and code sizes. For example, QINCo achieves better nearest-neighbor search accuracy using 12-byte codes than the state-of-the-art UNQ using 16 bytes on the BigANN1M and Deep1M datasets.
