Efficient Filtered-ANN via Learning-based Query Planning
Zhuocheng Gan, Yifan Wang
TL;DR
The paper tackles efficient filtered-ANN in vector retrieval by addressing the trade-off between pre-filtering and post-filtering execution strategies. It introduces a learning-based query planner that makes per-query decisions using a lightweight selectivity estimator and a core MLP-based predictor, ensuring compatibility with any underlying ANN index. Key contributions include a training-data framework with a recall/latency utility, histogram- and statistics-based selectivity estimators for categorical, numeric, and mixed predicates, and a per-dataset trained planner that demonstrates up to 4× speedups while maintaining high recall. The approach reduces index-construction overhead and adapts to varying workloads, offering practical, robust performance improvements for real-world filtered-ANN workloads. Let $U = \frac{\mathrm{Recall@k}}{T_{\text{search}}}$ denote end-to-end utility used for training labels, and $s$ denote predicate selectivity; the planner learns to maximize $U$ by choosing between pre-filtering and post-filtering per query. The method achieves strong gains on real and synthetic datasets, with notable recall preservation (e.g., recall@10 ≈ 0.96 on ArXiv) and substantial latency reductions, enabling scalable, predicate-aware vector search in production systems.
Abstract
Filtered ANN search is an increasingly important problem in vector retrieval, yet systems face a difficult trade-off due to the execution order: Pre-filtering (filtering first, then ANN over the passing subset) requires expensive per-predicate index construction, while post-filtering (ANN first, then filtering candidates) may waste computation and lose recall under low selectivity due to insufficient candidates after filtering. We introduce a learning-based query planning framework that dynamically selects the most effective execution plan for each query, using lightweight predictions derived from dataset and query statistics (e.g., dimensionality, corpus size, distribution features, and predicate statistics). The framework supports diverse filter types, including categorical/keyword and range predicates, and is generic to use any backend ANN index. Experiments show that our method achieves up to 4x acceleration with >= 90% recall comparing to the strong baselines.
