Table of Contents
Fetching ...

Effective and General Distance Computation for Approximate Nearest Neighbor Search

Mingyu Yang, Wentao Li, Jiabao Jin, Xiaoyao Zhong, Xiangyu Wang, Zhitao Shen, Wei Jia, Wei Wang

TL;DR

This work leverages data distribution to improve distance approximation via orthogonal projection, thereby ad-dressing the effectiveness limitation of ADSampling and adopts a data-driven approach to distance correction, decoupling the correction process from the distance approximation process, thereby overcoming the generality limitation of ADSampling.

Abstract

Approximate K Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. In AKNN search, distance computation is the core task that dominates the runtime. Existing approaches typically use approximate distances to improve computational efficiency, often at the cost of reduced search accuracy. To address this issue, the state-of-the-art method, ADSampling, employs random projections to estimate approximate distances and introduces an additional distance correction process to mitigate accuracy loss. However, ADSampling has limitations in both effectiveness and generality, primarily due to its reliance on random projections for distance approximation and correction. To address the effectiveness limitations of ADSampling, we leverage data distribution to improve distance computation via orthogonal projection. Furthermore, to overcome the generality limitations of ADSampling, we adopt a data-driven approach to distance correction, decoupling the correction process from the distance approximation process. Extensive experiments demonstrate the superiority and effectiveness of our method. In particular, compared to ADSampling, our method achieves a speedup of 1.6 to 2.1 times on real-world datasets while providing higher accuracy.

Effective and General Distance Computation for Approximate Nearest Neighbor Search

TL;DR

This work leverages data distribution to improve distance approximation via orthogonal projection, thereby ad-dressing the effectiveness limitation of ADSampling and adopts a data-driven approach to distance correction, decoupling the correction process from the distance approximation process, thereby overcoming the generality limitation of ADSampling.

Abstract

Approximate K Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. In AKNN search, distance computation is the core task that dominates the runtime. Existing approaches typically use approximate distances to improve computational efficiency, often at the cost of reduced search accuracy. To address this issue, the state-of-the-art method, ADSampling, employs random projections to estimate approximate distances and introduces an additional distance correction process to mitigate accuracy loss. However, ADSampling has limitations in both effectiveness and generality, primarily due to its reliance on random projections for distance approximation and correction. To address the effectiveness limitations of ADSampling, we leverage data distribution to improve distance computation via orthogonal projection. Furthermore, to overcome the generality limitations of ADSampling, we adopt a data-driven approach to distance correction, decoupling the correction process from the distance approximation process. Extensive experiments demonstrate the superiority and effectiveness of our method. In particular, compared to ADSampling, our method achieves a speedup of 1.6 to 2.1 times on real-world datasets while providing higher accuracy.
Paper Structure (21 sections, 3 theorems, 4 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 21 sections, 3 theorems, 4 equations, 10 figures, 3 tables, 2 algorithms.

Key Result

Lemma 1

For a given point $x \in \mathbb{R}^D$, a random projection $P\in \mathbb{R}^{d \times D}$ preserves its Euclidean norm with a multiplicative error $\epsilon$ bound with the probability of

Figures (10)

  • Figure 1: The Distribution of Estimation Error
  • Figure 2: The Empirical Analysis of the New Error Bound
  • Figure 3: The Example of How Our Methods Work
  • Figure 4: The Example of the Learned Decision Boundary
  • Figure 5: The Test of Performance Among Various Methods
  • ...and 5 more figures

Theorems & Definitions (5)

  • Lemma 1: ADSampling:journals/sigmod/GaoL23
  • Theorem 1
  • Lemma 2
  • Example 1
  • Example 2