Approximate Nearest Neighbor Search with Window Filters
Joshua Engels, Benjamin Landrum, Shangdi Yu, Laxman Dhulipala, Julian Shun
TL;DR
This work identifies window-based filtering as a critical yet underexplored extension to approximate nearest neighbor search, where each item carries a numeric label and queries specify a label interval. It introduces a modular framework built around a β-Window Search Tree (β-WST) that places ANN indices at internal nodes and traverses a logarithmic-depth search across label ranges, achieving c-approximate window search with provable guarantees when combined with a base ANN like Vamana/DiskANN. Theoretical results give explicit time and memory bounds, including how costs scale with dataset size, doubling dimension, and the base index, and they compare against optimized postfiltering and other baselines. Empirically, the approach delivers up to 75× speedups over strong baselines on diverse datasets (including adversarial embeddings and timestamped image embeddings) while maintaining comparable recall, demonstrating a practical path to efficient window-filtered semantic search in vector databases.
Abstract
We define and investigate the problem of $\textit{c-approximate window search}$: approximate nearest neighbor search where each point in the dataset has a numeric label, and the goal is to find nearest neighbors to queries within arbitrary label ranges. Many semantic search problems, such as image and document search with timestamp filters, or product search with cost filters, are natural examples of this problem. We propose and theoretically analyze a modular tree-based framework for transforming an index that solves the traditional c-approximate nearest neighbor problem into a data structure that solves window search. On standard nearest neighbor benchmark datasets equipped with random label values, adversarially constructed embeddings, and image search embeddings with real timestamps, we obtain up to a $75\times$ speedup over existing solutions at the same level of recall.
