Table of Contents
Fetching ...

iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search

Yuexuan Xu, Jianyang Gao, Yutong Gou, Cheng Long, Christian S. Jensen

TL;DR

This study materializes graph-based indexes, called elemental graphs, for a moderate number of ranges, and provides an effective and efficient algorithm that during querying can construct an index for any query range using the elemental graphs.

Abstract

Range-filtering approximate nearest neighbor (RFANN) search is attracting increasing attention in academia and industry. Given a set of data objects, each being a pair of a high-dimensional vector and a numeric value, an RFANN query with a vector and a numeric range as parameters returns the data object whose numeric value is in the query range and whose vector is nearest to the query vector. To process this query, a recent study proposes to build $O(n^2)$ dedicated graph-based indexes for all possible query ranges to enable efficient processing on a database of $n$ objects. As storing all these indexes is prohibitively expensive, the study constructs compressed indexes instead, which reduces the memory consumption considerably. However, this incurs suboptimal performance because the compression is lossy. In this study, instead of materializing a compressed index for every possible query range in preparation for querying, we materialize graph-based indexes, called elemental graphs, for a moderate number of ranges. We then provide an effective and efficient algorithm that during querying can construct an index for any query range using the elemental graphs. We prove that the time needed to construct such an index is low. We also cover an experimental study on real-world datasets that provides evidence that the materialized elemental graphs only consume moderate space and that the proposed method is capable of superior and stable query performance across different query workloads.

iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search

TL;DR

This study materializes graph-based indexes, called elemental graphs, for a moderate number of ranges, and provides an effective and efficient algorithm that during querying can construct an index for any query range using the elemental graphs.

Abstract

Range-filtering approximate nearest neighbor (RFANN) search is attracting increasing attention in academia and industry. Given a set of data objects, each being a pair of a high-dimensional vector and a numeric value, an RFANN query with a vector and a numeric range as parameters returns the data object whose numeric value is in the query range and whose vector is nearest to the query vector. To process this query, a recent study proposes to build dedicated graph-based indexes for all possible query ranges to enable efficient processing on a database of objects. As storing all these indexes is prohibitively expensive, the study constructs compressed indexes instead, which reduces the memory consumption considerably. However, this incurs suboptimal performance because the compression is lossy. In this study, instead of materializing a compressed index for every possible query range in preparation for querying, we materialize graph-based indexes, called elemental graphs, for a moderate number of ranges. We then provide an effective and efficient algorithm that during querying can construct an index for any query range using the elemental graphs. We prove that the time needed to construct such an index is low. We also cover an experimental study on real-world datasets that provides evidence that the materialized elemental graphs only consume moderate space and that the proposed method is capable of superior and stable query performance across different query workloads.
Paper Structure (24 sections, 2 theorems, 5 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 5 figures, 3 tables, 1 algorithm.

Key Result

theorem 1

The time complexity of the algorithm for materializing the entire index of our method differs by up to a sub-logarithmic factor from that for constructing HNSW on the set of all objects.

Figures (5)

  • Figure 1: The iRangeGraph index applied to 16 data objects. It is based on a segment tree with 5 layers, namely L0 to L4. L0 has one segment corresponding to the range [1, 16]. L1 has two segments corresponding to the ranges [1, 8] and [9, 16] respectively, etc. The elemental graphs are materialized for each segment with respective data objects (e.g., an elemental graph based on $O_1,O_2,...,O_8$ is materialized for segment [1, 8] and the out-going edges of node $O_6$ are represented by the arrows). In all elemental graphs, the maximum out-degree $m$ of a node is 3 in this example.
  • Figure 2: Comparison of all methods on the single-attribute RFANN query with different datasets and query workloads of mixed, large, moderate and small range fractions. The curve of a method is missing for a certain dataset and query workload indicates that it fails to achieve at least 0.8 recall.
  • Figure 3: The ablation study of our core algorithm (constructing the dedicated graph on the fly) and edge selection algorithm \ref{['alg:edge selection']}.
  • Figure 4: Comparison between iRangeGraph and Oracle-HNSW under mixed range fraction.
  • Figure 5: Multi-attribute RFANN query performance.

Theorems & Definitions (4)

  • definition 1: RNG rng
  • definition 2: Range-filtering ANN (RFANN) query WindowFiltersegmentgraph
  • theorem 1
  • theorem 2