JAG: Joint Attribute Graphs for Filtered Nearest Neighbor Search
Haike Xu, Guy Blelloch, Laxman Dhulipala, Lars Gottesbüren, Rajesh Jayaram, Jakub Łącki
TL;DR
This work tackles the problem of efficiency in filtered nearest neighbor search across diverse filter types and query selectivities. It introduces JAG, a graph-based index that fuses vector similarity with attribute and filter proximities via the distances $dist_A$ and $dist_F$, augmented by a capped attribute distance to avoid navigational dead-ends. JAG supports two variants, Threshold-JAG and Weight-JAG, by constructing multiple edges through thresholds or weights and using a lexicographic comparator to guide exploration. Empirical results across five datasets show that JAG consistently outperforms filter-agnostic baselines, delivering high recall and substantially higher QPS, with multi-threshold and multi-weight configurations providing robust performance across varying selectivities and filter types.
Abstract
Despite filtered nearest neighbor search being a fundamental task in modern vector search systems, the performance of existing algorithms is highly sensitive to query selectivity and filter type. In particular, existing solutions excel either at specific filter categories (e.g., label equality) or within narrow selectivity bands (e.g., pre-filtering for low selectivity) and are therefore a poor fit for practical deployments that demand generalization to new filter types and unknown query selectivities. In this paper, we propose JAG (Joint Attribute Graphs), a graph-based algorithm designed to deliver robust performance across the entire selectivity spectrum and support diverse filter types. Our key innovation is the introduction of attribute and filter distances, which transform binary filter constraints into continuous navigational guidance. By constructing a proximity graph that jointly optimizes for both vector similarity and attribute proximity, JAG prevents navigational dead-ends and allows JAG to consistently outperform prior graph-based filtered nearest neighbor search methods. Our experimental results across five datasets and four filter types (Label, Range, Subset, Boolean) demonstrate that JAG significantly outperforms existing state-of-the-art baselines in both throughput and recall robustness.
