Table of Contents
Fetching ...

Probabilistic Kernel Function for Fast Angle Testing

Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa

TL;DR

This work tackles the problem of efficient angle testing in high-dimensional spaces for similarity search. It introduces two projection-based probabilistic kernel functions, KS1 and KS2, that rely on reference-angle concepts and deterministic projection structures to avoid Gaussian randomness and asymptotic requirements, achieving o(d) computation. The paper provides theoretical probability guarantees and a detailed complexity analysis, and demonstrates practical benefits: KS1 improves CEOs-based tasks such as k-MIPS, while KS2 enables a new probabilistic routing test that speeds up graph-based ANNS like HNSW, delivering substantial QPS gains. Together, these contributions offer a scalable, deterministic approach to fast angle testing with direct impact on fast similarity search in high dimensions.

Abstract

In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and employs a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be both theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5X ~ 3X higher query-per-second (QPS) throughput compared to the widely-used graph-based search algorithm HNSW.

Probabilistic Kernel Function for Fast Angle Testing

TL;DR

This work tackles the problem of efficient angle testing in high-dimensional spaces for similarity search. It introduces two projection-based probabilistic kernel functions, KS1 and KS2, that rely on reference-angle concepts and deterministic projection structures to avoid Gaussian randomness and asymptotic requirements, achieving o(d) computation. The paper provides theoretical probability guarantees and a detailed complexity analysis, and demonstrates practical benefits: KS1 improves CEOs-based tasks such as k-MIPS, while KS2 enables a new probabilistic routing test that speeds up graph-based ANNS like HNSW, delivering substantial QPS gains. Together, these contributions offer a scalable, deterministic approach to fast angle testing with direct impact on fast similarity search in high dimensions.

Abstract

In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and employs a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be both theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5X ~ 3X higher query-per-second (QPS) throughput compared to the widely-used graph-based search algorithm HNSW.

Paper Structure

This paper contains 29 sections, 6 theorems, 41 equations, 8 figures, 5 tables, 6 algorithms.

Key Result

Lemma 1.3

(Theorem 1 in ceos) Given two vectors $\bm{v}$, $\bm{q}$ on $\mathbb S^{d-1}$, and $m$ random vectors $\{\bm{u}_i\}^m_{i=1} \sim \mathcal{N}(0, I^d)$, let $\bm{u}_{\mathrm{max}} = {\mathop{\rm argmax}_{\bm{u_i}}} |\bm{q}^{\top}\bm{u_i}|$. As $m$ goes infinity, we have:

Figures (8)

  • Figure 1: Recall-QPS evaluation of ANNS. $k$ = 10.
  • Figure 2: Impact of $L$ (See Appendix \ref{['appendix:exp_other_L']} for other datasets). $k$ = 10. The y-axis of the upper figures denotes the additional index cost (%) of HNSW+PEOs compared to the original HNSW.
  • Figure 3: An illustration of Falconn, CEOs, and the proposed structure KS1.
  • Figure 4: Numerical computation under different $m$'s and $d$'s. The y-axis denotes the cosine of reference angle.
  • Figure 5: Recall-QPS evaluation of ANNS. $k$ = 100.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Lemma 1.3
  • Definition 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 5.1
  • Definition 6.1: Probabilistic Routing peos
  • Corollary 6.2
  • Lemma B.1
  • Definition B.2