Effective and General Distance Computation for Approximate Nearest Neighbor Search

Mingyu Yang; Wentao Li; Jiabao Jin; Xiaoyao Zhong; Xiangyu Wang; Zhitao Shen; Wei Jia; Wei Wang

Effective and General Distance Computation for Approximate Nearest Neighbor Search

Mingyu Yang, Wentao Li, Jiabao Jin, Xiaoyao Zhong, Xiangyu Wang, Zhitao Shen, Wei Jia, Wei Wang

TL;DR

This work leverages data distribution to improve distance approximation via orthogonal projection, thereby ad-dressing the effectiveness limitation of ADSampling and adopts a data-driven approach to distance correction, decoupling the correction process from the distance approximation process, thereby overcoming the generality limitation of ADSampling.

Abstract

Approximate K Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. In AKNN search, distance computation is the core task that dominates the runtime. Existing approaches typically use approximate distances to improve computational efficiency, often at the cost of reduced search accuracy. To address this issue, the state-of-the-art method, ADSampling, employs random projections to estimate approximate distances and introduces an additional distance correction process to mitigate accuracy loss. However, ADSampling has limitations in both effectiveness and generality, primarily due to its reliance on random projections for distance approximation and correction. To address the effectiveness limitations of ADSampling, we leverage data distribution to improve distance computation via orthogonal projection. Furthermore, to overcome the generality limitations of ADSampling, we adopt a data-driven approach to distance correction, decoupling the correction process from the distance approximation process. Extensive experiments demonstrate the superiority and effectiveness of our method. In particular, compared to ADSampling, our method achieves a speedup of 1.6 to 2.1 times on real-world datasets while providing higher accuracy.

Effective and General Distance Computation for Approximate Nearest Neighbor Search

TL;DR

Abstract

Paper Structure (21 sections, 3 theorems, 4 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 21 sections, 3 theorems, 4 equations, 10 figures, 3 tables, 2 algorithms.

Introduction
Preliminaries
The AKNN Search
Existing Distance Computation Methods
Problem Analysis
An Improved Projection-Based Distance Computation
Distance Decomposition
An Improved Distance Estimation
An Improved Distance Correction
Implementation
A General Distance Computation
A Data-Driven Distance Correction
Implementation
Discussion of Out-of-Distribution Query
Analysis of Proposed Methods
...and 6 more sections

Key Result

Lemma 1

For a given point $x \in \mathbb{R}^D$, a random projection $P\in \mathbb{R}^{d \times D}$ preserves its Euclidean norm with a multiplicative error $\epsilon$ bound with the probability of

Figures (10)

Figure 1: The Distribution of Estimation Error
Figure 2: The Empirical Analysis of the New Error Bound
Figure 3: The Example of How Our Methods Work
Figure 4: The Example of the Learned Decision Boundary
Figure 5: The Test of Performance Among Various Methods
...and 5 more figures

Theorems & Definitions (5)

Lemma 1: ADSampling:journals/sigmod/GaoL23
Theorem 1
Lemma 2
Example 1
Example 2

Effective and General Distance Computation for Approximate Nearest Neighbor Search

TL;DR

Abstract

Effective and General Distance Computation for Approximate Nearest Neighbor Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (5)