Table of Contents
Fetching ...

ESG: Elastic Graphs for Range-Filtering Approximate k-Nearest Neighbor Search

Mingyu Yang, Wentao Li, Zhitao Shen, Chuan Xiao, Wei Wang

TL;DR

This paper tackles Range-Filtering Approximate k-Nearest Neighbors (RFAKNN) by introducing Elastic Graphs (ESG), a framework that exploits elastic relaxation of query ranges to preserve accuracy while achieving bounded, typically sublinear, query complexity. It provides theoretical guarantees showing that using superset ranges yields correct results and that the complexity scales modestly with the elastic factor, enabling efficient PostFiltering-based searches. The authors present two instantiations: ESG$_{1D}$ for half-bounded queries using $\log N$ graphs over ranges $[1,N/2^i]$, and ESG$_{2D}$ for general queries via a segment-tree of graphs with a two-subrange query-processing guarantee. Extensive experiments on benchmarks like SIFT, DEEP, GLOVE, WIT, and DEEP100M demonstrate 1.5x–6x improvements over state-of-the-art baselines while maintaining high recall and scalability to $10^8$ points, highlighting ESG’s practical impact for large-scale, range-constrained nearest neighbor retrievals.

Abstract

Range-filtering approximate $k$-nearest neighbor (RFAKNN) search takes as input a vector and a numeric value, returning $k$ points from a database of $N$ high-dimensional points. The returned points must satisfy two criteria: their numeric values must lie within the specified query range, and they must be approximately the $k$ nearest points to the query vector. To strike a better balance between query accuracy and efficiency, we propose novel methods that relax the strict requirement for subranges to \textit{exactly} match the query range. This elastic relaxation is based on a theoretical insight: allowing the controlled inclusion of out-of-range points during the search does not compromise the bounded complexity of the search process. Building on this insight, we prove that our methods reduce the number of required subranges to at most \textit{two}, eliminating the $O(\log N)$ query overhead inherent in existing methods. Extensive experiments on real-world datasets demonstrate that our proposed methods outperform state-of-the-art approaches, achieving performance improvements of 1.5x to 6x while maintaining high accuracy.

ESG: Elastic Graphs for Range-Filtering Approximate k-Nearest Neighbor Search

TL;DR

This paper tackles Range-Filtering Approximate k-Nearest Neighbors (RFAKNN) by introducing Elastic Graphs (ESG), a framework that exploits elastic relaxation of query ranges to preserve accuracy while achieving bounded, typically sublinear, query complexity. It provides theoretical guarantees showing that using superset ranges yields correct results and that the complexity scales modestly with the elastic factor, enabling efficient PostFiltering-based searches. The authors present two instantiations: ESG for half-bounded queries using graphs over ranges , and ESG for general queries via a segment-tree of graphs with a two-subrange query-processing guarantee. Extensive experiments on benchmarks like SIFT, DEEP, GLOVE, WIT, and DEEP100M demonstrate 1.5x–6x improvements over state-of-the-art baselines while maintaining high recall and scalability to points, highlighting ESG’s practical impact for large-scale, range-constrained nearest neighbor retrievals.

Abstract

Range-filtering approximate -nearest neighbor (RFAKNN) search takes as input a vector and a numeric value, returning points from a database of high-dimensional points. The returned points must satisfy two criteria: their numeric values must lie within the specified query range, and they must be approximately the nearest points to the query vector. To strike a better balance between query accuracy and efficiency, we propose novel methods that relax the strict requirement for subranges to \textit{exactly} match the query range. This elastic relaxation is based on a theoretical insight: allowing the controlled inclusion of out-of-range points during the search does not compromise the bounded complexity of the search process. Building on this insight, we prove that our methods reduce the number of required subranges to at most \textit{two}, eliminating the query overhead inherent in existing methods. Extensive experiments on real-world datasets demonstrate that our proposed methods outperform state-of-the-art approaches, achieving performance improvements of 1.5x to 6x while maintaining high accuracy.

Paper Structure

This paper contains 15 sections, 6 theorems, 4 equations, 11 figures, 5 tables, 4 algorithms.

Key Result

Theorem 1

Under the same assumptions as in DBLP:journals/pvldb/NSGFuXWC19, the expected search path length of MSNET with reverse edges for KNN search is:

Figures (11)

  • Figure 1: The example of the RFAKNN query, where the points $v_i$ has an additional numerical attribute $i$. Given a query point $q$ with the range $[4, 10]$, the answer (for $k=1$) is $v_7$ because the distance from $q$ to $v_7$ is the smallest among all in-range points $\mathcal{D}_{[4,10]}=\{v_4, v_5, \cdots, v_{10}\}$.
  • Figure 2: The Illustration of ${ {\mathsf{PreFiltering}}} \xspace$ and ${ {\mathsf{PostFiltering}}} \xspace$
  • Figure 3: The Example of Reconstruction-based Methods for the RFAKNN Query
  • Figure 4: The Illustration of the Potential of ${ {\mathsf{PostFiltering}}} \xspace$
  • Figure 5: The Example of ${ {\mathsf{ESG_{1D}}}} \xspace$
  • ...and 6 more figures

Theorems & Definitions (16)

  • definition 1
  • definition 2
  • Example 1
  • Example 2
  • Definition 1: Elastic Factor
  • Definition 2: Monotonic Search Path DBLP:journals/pvldb/NSGFuXWC19
  • Definition 3: Monotonic Search Network DBLP:journals/pvldb/NSGFuXWC19
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • ...and 6 more