Panorama: Fast-Track Nearest Neighbors
Vansh Ramani, Alexis Schlomer, Akash Nayar, Sayan Ranu, Jignesh M. Patel, Panagiotis Karras
TL;DR
Panorama tackles the refinement bottleneck in Approximate Nearest Neighbor Search for high-dimensional neural embeddings by learning data-driven orthogonal transforms that concentrate energy in leading dimensions, enabling exact kNN distances to be pruned with partial computations. It introduces a Cayley-transform-based parameterization on the Stiefel manifold and an energy-compaction loss to drive exponential tail-energy decay, plus a co-designed, memory-layout-aware implementation across contiguous and non-contiguous indexes. Theoretical guarantees show near-linear end-to-end speedups under energy-compaction assumptions, with robustness to out-of-distribution queries via an effective $\alpha_{\text{eff}}$. Empirically, Panorama yields 2–30× speedups across diverse datasets and ANN indices without recall loss, demonstrating practical impact for scalable retrieval in applications ranging from image search to RAG pipelines.
Abstract
Approximate Nearest-Neighbor Search (ANNS) efficiently finds data items whose embeddings are close to that of a given query in a high-dimensional space, aiming to balance accuracy with speed. Used in recommendation systems, image and video retrieval, natural language processing, and retrieval-augmented generation (RAG), ANNS algorithms such as IVFPQ, HNSW graphs, Annoy, and MRPT utilize graph, tree, clustering, and quantization techniques to navigate large vector spaces. Despite this progress, ANNS systems spend up to 99% of query time to compute distances in their final refinement phase. In this paper, we present PANORAMA, a machine learning-driven approach that tackles the ANNS verification bottleneck through data-adaptive learned orthogonal transforms that facilitate the accretive refinement of distance bounds. Such transforms compact over 90% of signal energy into the first half of dimensions, enabling early candidate pruning with partial distance computations. We integrate PANORAMA into state-of-the-art ANNS methods, namely IVFPQ/Flat, HNSW, MRPT, and Annoy, without index modification, using level-major memory layouts, SIMD-vectorized partial distance computations, and cache-aware access patterns. Experiments across diverse datasets -- from image-based CIFAR-10 and GIST to modern embedding spaces including OpenAI's Ada 2 and Large 3 -- demonstrate that PANORAMA affords a 2--30$\times$ end-to-end speedup with no recall loss.
