Table of Contents
Fetching ...

Efficient Filtered-ANN via Learning-based Query Planning

Zhuocheng Gan, Yifan Wang

TL;DR

The paper tackles efficient filtered-ANN in vector retrieval by addressing the trade-off between pre-filtering and post-filtering execution strategies. It introduces a learning-based query planner that makes per-query decisions using a lightweight selectivity estimator and a core MLP-based predictor, ensuring compatibility with any underlying ANN index. Key contributions include a training-data framework with a recall/latency utility, histogram- and statistics-based selectivity estimators for categorical, numeric, and mixed predicates, and a per-dataset trained planner that demonstrates up to 4× speedups while maintaining high recall. The approach reduces index-construction overhead and adapts to varying workloads, offering practical, robust performance improvements for real-world filtered-ANN workloads. Let $U = \frac{\mathrm{Recall@k}}{T_{\text{search}}}$ denote end-to-end utility used for training labels, and $s$ denote predicate selectivity; the planner learns to maximize $U$ by choosing between pre-filtering and post-filtering per query. The method achieves strong gains on real and synthetic datasets, with notable recall preservation (e.g., recall@10 ≈ 0.96 on ArXiv) and substantial latency reductions, enabling scalable, predicate-aware vector search in production systems.

Abstract

Filtered ANN search is an increasingly important problem in vector retrieval, yet systems face a difficult trade-off due to the execution order: Pre-filtering (filtering first, then ANN over the passing subset) requires expensive per-predicate index construction, while post-filtering (ANN first, then filtering candidates) may waste computation and lose recall under low selectivity due to insufficient candidates after filtering. We introduce a learning-based query planning framework that dynamically selects the most effective execution plan for each query, using lightweight predictions derived from dataset and query statistics (e.g., dimensionality, corpus size, distribution features, and predicate statistics). The framework supports diverse filter types, including categorical/keyword and range predicates, and is generic to use any backend ANN index. Experiments show that our method achieves up to 4x acceleration with >= 90% recall comparing to the strong baselines.

Efficient Filtered-ANN via Learning-based Query Planning

TL;DR

The paper tackles efficient filtered-ANN in vector retrieval by addressing the trade-off between pre-filtering and post-filtering execution strategies. It introduces a learning-based query planner that makes per-query decisions using a lightweight selectivity estimator and a core MLP-based predictor, ensuring compatibility with any underlying ANN index. Key contributions include a training-data framework with a recall/latency utility, histogram- and statistics-based selectivity estimators for categorical, numeric, and mixed predicates, and a per-dataset trained planner that demonstrates up to 4× speedups while maintaining high recall. The approach reduces index-construction overhead and adapts to varying workloads, offering practical, robust performance improvements for real-world filtered-ANN workloads. Let denote end-to-end utility used for training labels, and denote predicate selectivity; the planner learns to maximize by choosing between pre-filtering and post-filtering per query. The method achieves strong gains on real and synthetic datasets, with notable recall preservation (e.g., recall@10 ≈ 0.96 on ArXiv) and substantial latency reductions, enabling scalable, predicate-aware vector search in production systems.

Abstract

Filtered ANN search is an increasingly important problem in vector retrieval, yet systems face a difficult trade-off due to the execution order: Pre-filtering (filtering first, then ANN over the passing subset) requires expensive per-predicate index construction, while post-filtering (ANN first, then filtering candidates) may waste computation and lose recall under low selectivity due to insufficient candidates after filtering. We introduce a learning-based query planning framework that dynamically selects the most effective execution plan for each query, using lightweight predictions derived from dataset and query statistics (e.g., dimensionality, corpus size, distribution features, and predicate statistics). The framework supports diverse filter types, including categorical/keyword and range predicates, and is generic to use any backend ANN index. Experiments show that our method achieves up to 4x acceleration with >= 90% recall comparing to the strong baselines.
Paper Structure (15 sections, 2 figures, 2 tables)

This paper contains 15 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Workflow of the learned query planning framework
  • Figure 2: Latency--recall trade-offs across datasets under varying selectivity.