Table of Contents
Fetching ...

RNSG: A Range-Aware Graph Index for Efficient Range-Filtered Approximate Nearest Neighbor Search

Zhiqiu Zou, Ziqi Yin, Rong-Hua Li, Hongchao Qin, Qiangqiang Dai, Guoren Wang

Abstract

Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.

RNSG: A Range-Aware Graph Index for Efficient Range-Filtered Approximate Nearest Neighbor Search

Abstract

Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.
Paper Structure (14 sections, 16 theorems, 1 equation, 12 figures, 2 tables, 3 algorithms)

This paper contains 14 sections, 16 theorems, 1 equation, 12 figures, 2 tables, 3 algorithms.

Key Result

lemma 1

Given a dataset $D$ of $n$ points in a metric space with distance function $\delta$, let $G(V, E)$ be an MRNG constructed on $D$. For any node $x \in V$, its nearest neighbor $y \in V$ can be found by performing a beam search on $G$.

Figures (12)

  • Figure 1: Illustration of MRNG for ANN and RFANN queries (red dashed edges are pruned by MRNG). (a) Under a standard ANN query, the MRNG correctly preserves a path to the true nearest neighbor (node 3). (b) Under an RFANN query with a range constraint, the required path is broken, making the true nearest neighbor unreachable from the query node.
  • Figure 2: Illustration of the RRNG pruning strategy
  • Figure 3: Differences between MRNG and RRNG. The RRNG tends to replace edges (e.g., the edge $(1,4)$) with large attribute differences with those having smaller attribute differences (e.g., the edges $(1,2)$ and $(2,4)$). In the MRNG, $1\rightarrow 4$ is a monotonic path, while in the RRNG $1\rightarrow 2 \rightarrow 4$ is a monotonic path.
  • Figure 4: Illustration of the hereditary property of RRNG. The left subfigure shows the complete RRNG graph, while the right subfigure depicts the subgraph induced by a query range $[3, 6]$.
  • Figure 5: Illustration of entry node set generation
  • ...and 7 more figures

Theorems & Definitions (22)

  • definition 1: Monotonic Relative Neighborhood Graph fu2017fast
  • lemma 1: fu2017fast
  • lemma 2: fu2017fast
  • definition 2: RRNG
  • Example 1
  • definition 3: Monotonic Path
  • theorem 1: Monotonic Searchability Property
  • corollary 1
  • theorem 2: Hereditary Property
  • Example 2
  • ...and 12 more