Efficient Filtered-ANN via Learning-based Query Planning

Zhuocheng Gan; Yifan Wang

Efficient Filtered-ANN via Learning-based Query Planning

Zhuocheng Gan, Yifan Wang

TL;DR

The paper tackles efficient filtered-ANN in vector retrieval by addressing the trade-off between pre-filtering and post-filtering execution strategies. It introduces a learning-based query planner that makes per-query decisions using a lightweight selectivity estimator and a core MLP-based predictor, ensuring compatibility with any underlying ANN index. Key contributions include a training-data framework with a recall/latency utility, histogram- and statistics-based selectivity estimators for categorical, numeric, and mixed predicates, and a per-dataset trained planner that demonstrates up to 4× speedups while maintaining high recall. The approach reduces index-construction overhead and adapts to varying workloads, offering practical, robust performance improvements for real-world filtered-ANN workloads. Let $U = \frac{\mathrm{Recall@k}}{T_{\text{search}}}$ denote end-to-end utility used for training labels, and $s$ denote predicate selectivity; the planner learns to maximize $U$ by choosing between pre-filtering and post-filtering per query. The method achieves strong gains on real and synthetic datasets, with notable recall preservation (e.g., recall@10 ≈ 0.96 on ArXiv) and substantial latency reductions, enabling scalable, predicate-aware vector search in production systems.

Abstract

Filtered ANN search is an increasingly important problem in vector retrieval, yet systems face a difficult trade-off due to the execution order: Pre-filtering (filtering first, then ANN over the passing subset) requires expensive per-predicate index construction, while post-filtering (ANN first, then filtering candidates) may waste computation and lose recall under low selectivity due to insufficient candidates after filtering. We introduce a learning-based query planning framework that dynamically selects the most effective execution plan for each query, using lightweight predictions derived from dataset and query statistics (e.g., dimensionality, corpus size, distribution features, and predicate statistics). The framework supports diverse filter types, including categorical/keyword and range predicates, and is generic to use any backend ANN index. Experiments show that our method achieves up to 4x acceleration with >= 90% recall comparing to the strong baselines.

Efficient Filtered-ANN via Learning-based Query Planning

TL;DR

denote end-to-end utility used for training labels, and

denote predicate selectivity; the planner learns to maximize

by choosing between pre-filtering and post-filtering per query. The method achieves strong gains on real and synthetic datasets, with notable recall preservation (e.g., recall@10 ≈ 0.96 on ArXiv) and substantial latency reductions, enabling scalable, predicate-aware vector search in production systems.

Abstract

Paper Structure (15 sections, 2 figures, 2 tables)

This paper contains 15 sections, 2 figures, 2 tables.

Introduction
Related Work
Learning-based Query Planning
Training Data Preparation
Selectivity Estimation
Categorical (Keyword) Predicates
Numeric Range Predicates
Mixed Predicates
Core Planner: Choosing Execution Strategy
Evaluations
Experiment Settings
Evaluation Results
Index Construction Cost
End-to-End Results.
Conclusion

Figures (2)

Figure 1: Workflow of the learned query planning framework
Figure 2: Latency--recall trade-offs across datasets under varying selectivity.

Efficient Filtered-ANN via Learning-based Query Planning

TL;DR

Abstract

Efficient Filtered-ANN via Learning-based Query Planning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)