Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Théophane Vallaeys; Matthew Muckley; Jakob Verbeek; Matthijs Douze

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Théophane Vallaeys, Matthew Muckley, Jakob Verbeek, Matthijs Douze

TL;DR

Qinco2 tackles inefficiencies in multi-codebook vector quantization by introducing implicit neural codebooks that condition each codebook on previous reconstructions. It features codeword pre-selection, beam-search encoding, and a fast pairwise additive decoder for large-scale search, along with optimized training and architecture. The method achieves a 34% MSE reduction for 16-byte BigANN encodings and a 24% gain in 8-byte Deep1M recall, outperforming prior baselines across four datasets. These advances enable high-accuracy compression and efficient billion-scale ANN retrieval, with a flexible trade-off between encoding and decoding speed.

Abstract

Vector quantization is a fundamental technique for compression and large-scale nearest neighbor search. For high-accuracy operating points, multi-codebook quantization associates data vectors with one element from each of multiple codebooks. An example is residual quantization (RQ), which iteratively quantizes the residual error of previous steps. Dependencies between the different parts of the code are, however, ignored in RQ, which leads to suboptimal rate-distortion performance. QINCo recently addressed this inefficiency by using a neural network to determine the quantization codebook in RQ based on the vector reconstruction from previous steps. In this paper we introduce QINCo2 which extends and improves QINCo with (i) improved vector encoding using codeword pre-selection and beam-search, (ii) a fast approximate decoder leveraging codeword pairs to establish accurate short-lists for search, and (iii) an optimized training procedure and network architecture. We conduct experiments on four datasets to evaluate QINCo2 for vector compression and billion-scale nearest neighbor search. We obtain outstanding results in both settings, improving the state-of-the-art reconstruction MSE by 34% for 16-byte vector compression on BigANN, and search accuracy by 24% with 8-byte encodings on Deep1M.

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

TL;DR

Abstract

Paper Structure (16 sections, 8 equations, 9 figures, 7 tables)

This paper contains 16 sections, 8 equations, 9 figures, 7 tables.

Introduction
Background and related work
Implicit neural codebooks
Notation and background
Improved implicit neural codebooks: encoding
Large-scale nearest neighbor search
Experimental validation
Experimental setup
Vector compression
Large-scale vector search
Conclusion
Implementation details
QINCo2 architecture
Training QINCo2
Large-scale search with Faiss
...and 1 more sections

Figures (9)

Figure 1: Pareto fronts of quantization error on the BigANN1M dataset using 8 bytes codes.Left: Models with $L\!=\!8$ blocks, and each curve using a different number of blocks $L_s$ for pre-selection. Each curve covers models that differ in the number of pre-selected codewords $A$ and beam size $B$. Models are trained using $A=8$ and $B\in\{2,\dots,32\}$, and evaluated with $A\in\{8,\dots,64\}$, and $B\in\{2,\dots,128\}$. Stars show models using $A\!=\!8,B\!=\!16$ for evaluation. All models have the same decoding speed. Right: models with different number of residual blocks $L$. For models on one curve, the decoding time is fixed, and the encoding time is varied by changing $A$ and $B$.
Figure 2: Pareto-optimal front of QINCo2 operating points for MSE and encoding time. Evaluation on 10M vectors, varying the models parameters $A$, $B$, $L$, ${d_\textrm{e}}$ and ${d_\textrm{h}}$ when encoding using $M=8$ bytes on BigANN1M. Models are trained with different $A,B\in\{16,32\}$, and evaluated with a range of values up to $A=64$ and $B=32$. Marker shape is set according to the product of the encoding parameters $A\times B$, and color according to the decoding time, determined by the network depth ($L$) and width (${d_\textrm{e}}, {d_\textrm{h}}$). Results for QINCo are shown as yellow stars for comparison.
Figure 3: Search results using QINCo2 decoder and approximate decoders for QINCo2 codes. For each combination of dataset and bitrate, we report the retrieval accuracy over 1M vectors, as well as the accuracy of QINCo2-S over a shortlist of 10 elements generated by the method.
Figure 4: Retrieval accuracy/efficiency trade-off on the Bigann1B dataset in terms of queries per second (QPS) and recall (R@1) when combining PQ, RQ, QINCo, and QINCo2 with IVF.
Figure S1: MSE of QINCo2 and previous methods at different bitrates. Dotted lines show the bitrate reduction of QINCo2 compared to previous methods at a fixed MSE.
...and 4 more figures

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

TL;DR

Abstract

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)