GARLIC: GAussian Representation LearnIng for spaCe partitioning
Panagiotis Rigas, Panagiotis Drivas, Charalambos Tzamos, Ioannis Chamodrakas, George Ioannakis, Leonidas J. Guibas, Ioannis Z. Emiris
TL;DR
The paper addresses efficient high-dimensional Euclidean ANN search by learning geometry-aware space partitions. It introduces GARLIC, which represents $\mathbb{R}^d$ with Gaussian cells that are anisotropic and density-adaptive, aligning with local principal directions and refining capacity locally via split/clone operations. The learning objective is information-theoretic, balancing expected coverage, overlap, and probe-budget efficiency, with query-time processing using Mahalanobis distance to select cells and prune candidates via localized quantization. Empirical results on standard benchmarks show competitive recall–efficiency trade-offs, robustness to training on a small subset of data, and provide complexity analyses for build, query, and storage. Overall GARLIC demonstrates a principled, geometry-aware approach to ANN that improves locality and efficiency over isotropic or fixed-resolution partitions.
Abstract
We present \textbf{GARLIC}, a representation learning approach for Euclidean approximate nearest neighbor (ANN) search in high dimensions. Existing partitions tend to rely on isotropic cells, fixed global resolution, or balanced constraints, which fragment dense regions and merge unrelated points in sparse ones, thereby increasing the candidate count when probing only a few cells. Our method instead partitions \(\mathbb{R}^d\) into anisotropic Gaussian cells whose shapes align with local geometry and sizes adapt to data density. Information-theoretic objectives balance coverage, overlap, and geometric alignment, while split/clone refinement introduces Gaussians only where needed. At query time, Mahalanobis distance selects relevant cells and localized quantization prunes candidates. This yields partitions that reduce cross-cell neighbor splits and candidate counts under small probe budgets, while remaining robust even when trained on only a small fraction of the dataset. Overall, GARLIC introduces a geometry-aware space-partitioning paradigm that combines information-theoretic objectives with adaptive density refinement, offering competitive recall--efficiency trade-offs for Euclidean ANN search.
