Table of Contents
Fetching ...

JAG: Joint Attribute Graphs for Filtered Nearest Neighbor Search

Haike Xu, Guy Blelloch, Laxman Dhulipala, Lars Gottesbüren, Rajesh Jayaram, Jakub Łącki

TL;DR

This work tackles the problem of efficiency in filtered nearest neighbor search across diverse filter types and query selectivities. It introduces JAG, a graph-based index that fuses vector similarity with attribute and filter proximities via the distances $dist_A$ and $dist_F$, augmented by a capped attribute distance to avoid navigational dead-ends. JAG supports two variants, Threshold-JAG and Weight-JAG, by constructing multiple edges through thresholds or weights and using a lexicographic comparator to guide exploration. Empirical results across five datasets show that JAG consistently outperforms filter-agnostic baselines, delivering high recall and substantially higher QPS, with multi-threshold and multi-weight configurations providing robust performance across varying selectivities and filter types.

Abstract

Despite filtered nearest neighbor search being a fundamental task in modern vector search systems, the performance of existing algorithms is highly sensitive to query selectivity and filter type. In particular, existing solutions excel either at specific filter categories (e.g., label equality) or within narrow selectivity bands (e.g., pre-filtering for low selectivity) and are therefore a poor fit for practical deployments that demand generalization to new filter types and unknown query selectivities. In this paper, we propose JAG (Joint Attribute Graphs), a graph-based algorithm designed to deliver robust performance across the entire selectivity spectrum and support diverse filter types. Our key innovation is the introduction of attribute and filter distances, which transform binary filter constraints into continuous navigational guidance. By constructing a proximity graph that jointly optimizes for both vector similarity and attribute proximity, JAG prevents navigational dead-ends and allows JAG to consistently outperform prior graph-based filtered nearest neighbor search methods. Our experimental results across five datasets and four filter types (Label, Range, Subset, Boolean) demonstrate that JAG significantly outperforms existing state-of-the-art baselines in both throughput and recall robustness.

JAG: Joint Attribute Graphs for Filtered Nearest Neighbor Search

TL;DR

This work tackles the problem of efficiency in filtered nearest neighbor search across diverse filter types and query selectivities. It introduces JAG, a graph-based index that fuses vector similarity with attribute and filter proximities via the distances and , augmented by a capped attribute distance to avoid navigational dead-ends. JAG supports two variants, Threshold-JAG and Weight-JAG, by constructing multiple edges through thresholds or weights and using a lexicographic comparator to guide exploration. Empirical results across five datasets show that JAG consistently outperforms filter-agnostic baselines, delivering high recall and substantially higher QPS, with multi-threshold and multi-weight configurations providing robust performance across varying selectivities and filter types.

Abstract

Despite filtered nearest neighbor search being a fundamental task in modern vector search systems, the performance of existing algorithms is highly sensitive to query selectivity and filter type. In particular, existing solutions excel either at specific filter categories (e.g., label equality) or within narrow selectivity bands (e.g., pre-filtering for low selectivity) and are therefore a poor fit for practical deployments that demand generalization to new filter types and unknown query selectivities. In this paper, we propose JAG (Joint Attribute Graphs), a graph-based algorithm designed to deliver robust performance across the entire selectivity spectrum and support diverse filter types. Our key innovation is the introduction of attribute and filter distances, which transform binary filter constraints into continuous navigational guidance. By constructing a proximity graph that jointly optimizes for both vector similarity and attribute proximity, JAG prevents navigational dead-ends and allows JAG to consistently outperform prior graph-based filtered nearest neighbor search methods. Our experimental results across five datasets and four filter types (Label, Range, Subset, Boolean) demonstrate that JAG significantly outperforms existing state-of-the-art baselines in both throughput and recall robustness.
Paper Structure (24 sections, 9 equations, 13 figures, 3 tables, 4 algorithms)

This paper contains 24 sections, 9 equations, 13 figures, 3 tables, 4 algorithms.

Figures (13)

  • Figure 1: QPS vs. recall plot for range filters on the MSTuring-10M dataset. Please refer to Section \ref{['sec:experiments']} for experimental details.
  • Figure 2: An example illustrating how JAG uses filter distance and vector distance to solve a range query. In this figure, both the vector value (x-axis) and attribute value (y-axis) are one-dimensional real numbers. The query specifies a filter range of [3,5]. The dashed arrow shows how filter distance and vector distance guide the greedy search toward the range query. Increasing intensity of the red color indicates improvement (decrease) in the filter distance.
  • Figure 3: QPS vs. recall plot for Label filters on the SIFT and ARXIV datasets. Note that NHQ is designed specifically for Label filter. FilteredVamana, StitchedVamana, and UNG are designed specifically for Label and Subset filters.
  • Figure 4: QPS vs. recall plots for subset filters on the MSTuring-10M, LAION-5M, and 25M datasets, and for boolean filters on the MSTuring-10M dataset. Note that FilteredVamana and StitchedVamana are only for label and subset filters.
  • Figure 5: QPS vs. Recall plot for range filters on the ARXIV and bool filters on MSTuring datasets. Note that iRangeGraph is designed specifically for Range filter
  • ...and 8 more figures