Results of the Big ANN: NeurIPS'23 competition

Harsha Vardhan Simhadri; Martin Aumüller; Amir Ingber; Matthijs Douze; George Williams; Magdalen Dobson Manohar; Dmitry Baranchuk; Edo Liberty; Frank Liu; Ben Landrum; Mazin Karjikar; Laxman Dhulipala; Meng Chen; Yue Chen; Rui Ma; Kai Zhang; Yuzheng Cai; Jiayang Shi; Yizhuo Chen; Weiguo Zheng; Zihao Wan; Jie Yin; Ben Huang

Results of the Big ANN: NeurIPS'23 competition

Harsha Vardhan Simhadri, Martin Aumüller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, Frank Liu, Ben Landrum, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weiguo Zheng, Zihao Wan, Jie Yin, Ben Huang

TL;DR

The paper surveys NeurIPS 2023's Big ANN Challenge, which pushes practical ANN indexing and search across four workloads—filtered, out-of-distribution, sparse, and streaming—under constrained hardware. It details the tracks, datasets, evaluation framework, and submission protocol, and reports that top solutions significantly outperform industry baselines through graph-based indexing, quantization, and hybrid vector-metadata strategies. The competition demonstrates notable academic-industrial collaboration, reveals strengths and limitations of current approaches, and provides insights into future directions for robust, real-world ANN systems. An open-source framework and ongoing leaderboard are highlighted to sustain progress and reproducibility in the vector-search community.

Abstract

The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search ~\cite{DBLP:conf/nips/SimhadriWADBBCH21}, this competition addressed filtered search, out-of-distribution data, sparse and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency over industry-standard baselines, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.

Results of the Big ANN: NeurIPS'23 competition

TL;DR

Abstract

Paper Structure (23 sections, 5 figures, 6 tables)

This paper contains 23 sections, 5 figures, 6 tables.

Introduction
Broader Impact.
Limitations.
Tracks and datasets
Filtered Search Track
Out-Of-Distribution Track
Sparse Track
Streaming Track
Evaluation
Metrics
Search accuracy.
Evaluation protocol
Details of a submission.
Competition results: baselines and notable approaches
Filtered Search Track
...and 8 more sections

Figures (5)

Figure 1: Example images from the Filtered track, and their associated tags: query (left) and database (right). The images are represented by CLIP embedding vectors.
Figure 2: PCA projection of 1000 arbitrary query vectors and 1000 database vectors from the OOD dataset. Left: the two first PCA dimensions, right: the two following ones.
Figure 3: Performance of the different algorithms in the filter track on the private query set.
Figure 4: Performance of the different algorithms in the OOD track.
Figure 5: Performance of the different algorithms in the sparse track on the private query set.

Theorems & Definitions (1)

Definition 1

Results of the Big ANN: NeurIPS'23 competition

TL;DR

Abstract

Results of the Big ANN: NeurIPS'23 competition

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)