Table of Contents
Fetching ...

Spreading vectors for similarity search

Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

TL;DR

The paper tackles multi-dimensional similarity search by reframing the problem: instead of adapting the quantizer to data, it trains a fixed quantization structure to the data via a catalyzer that maps inputs to a uniform spherical latent space while preserving neighborhood relations. It introduces the KoLeo differential-entropy regularizer and a rank-preserving triplet loss, combining them to produce representations that fit well with fixed discretizers such as binary signs and spherical lattices. Experiments on Deep1M and BigAnn show that catalyzer-enabled lattice quantizers outperform traditional PQ/OPQ baselines and can scale to large datasets, with end-to-end training providing additional benefits when coupled with discretization. The work also demonstrates that the catalyzer can serve as a universal preprocessing step for various quantizers, and it provides open-source code for practical adoption.

Abstract

Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quantizer, such as pre-defined points of a hyper-sphere. As a proxy objective, we design and train a neural network that favors uniformity in the spherical latent space, while preserving the neighborhood structure after the mapping. We propose a new regularizer derived from the Kozachenko--Leonenko differential entropy estimator to enforce uniformity and combine it with a locality-aware triplet loss. Experiments show that our end-to-end approach outperforms most learned quantization methods, and is competitive with the state of the art on widely adopted benchmarks. Furthermore, we show that training without the quantization step results in almost no difference in accuracy, but yields a generic catalyzer that can be applied with any subsequent quantizer.

Spreading vectors for similarity search

TL;DR

The paper tackles multi-dimensional similarity search by reframing the problem: instead of adapting the quantizer to data, it trains a fixed quantization structure to the data via a catalyzer that maps inputs to a uniform spherical latent space while preserving neighborhood relations. It introduces the KoLeo differential-entropy regularizer and a rank-preserving triplet loss, combining them to produce representations that fit well with fixed discretizers such as binary signs and spherical lattices. Experiments on Deep1M and BigAnn show that catalyzer-enabled lattice quantizers outperform traditional PQ/OPQ baselines and can scale to large datasets, with end-to-end training providing additional benefits when coupled with discretization. The work also demonstrates that the catalyzer can serve as a universal preprocessing step for various quantizers, and it provides open-source code for practical adoption.

Abstract

Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quantizer, such as pre-defined points of a hyper-sphere. As a proxy objective, we design and train a neural network that favors uniformity in the spherical latent space, while preserving the neighborhood structure after the mapping. We propose a new regularizer derived from the Kozachenko--Leonenko differential entropy estimator to enforce uniformity and combine it with a locality-aware triplet loss. Experiments show that our end-to-end approach outperforms most learned quantization methods, and is competitive with the state of the art on widely adopted benchmarks. Furthermore, we show that training without the quantization step results in almost no difference in accuracy, but yields a generic catalyzer that can be applied with any subsequent quantizer.

Paper Structure

This paper contains 20 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Our method learns a network that encodes the input space $\mathbb R^d$ into a code $c(x)$. It is learned end-to-end, yet the part of the network in charge of the discretization operation is fixed in advance, thereby avoiding optimization problems. The learnable function $f$, namely the "catalyzer", is optimized to increase the quality of the subsequent coding stage.
  • Figure 2: Illustration of our method, which takes as input a set of samples from an unknown distribution. We learn a neural network that aims at preserving the neighborhood structure in the input space while best covering the output space (uniformly). This trade-off is controlled by a parameter $\lambda$. The case $\lambda=0$ keeps the locality of the neighbors but does not cover the output space. On the opposite, when the loss degenerates to the differential entropic regularizer ($\lambda \to \infty$), the neighbors are not maintained by the mapping. Intermediate values offer different trade-offs between neighbor fidelity and uniformity, which is proper input for an efficient lattice quantizer (depicted here by the hexagonal lattice $A_2$).
  • Figure 3: Histograms of the distance between a query point and its 1st (resp. 100) nearest neighbors, in the original space (left) and after our catalyzer (right). In the original space, the two histograms have a significant overlap, which means that a 100-th nearest neighbor for a query has often a distance lower that the 1st neighbor for another query. This gap is significantly reduced by our catalyzer.
  • Figure 4: Impact of the regularizer on the output distribution. Each column corresponds to a different amount of regularization (left: $\lambda=0$, middle: $\lambda=0.02$, right: $\lambda=1$). Each line corresponds to a different random projection of the empirical distribution, parametrized by an angle in $[0,2 \pi ]$. The marginal distributions for these two views are much more uniform with our KoLeo regularizer, which is a consequence of the higher uniformity in the high-dimensional latent space.
  • Figure 5: Comparison of the performance of the product lattice vs OPQ on Deep1M (left) and BigAnn1M (right). Our method maps the input vectors to a ${d_\mathrm{out}}$-dimensional space, that is then quantized with a lattice of radius $r$. We obtain the curves by varying the radius $r$.
  • ...and 2 more figures