PathFinder: Efficiently Supporting Conjunctions and Disjunctions for Filtered Approximate Nearest Neighbor Search
Tianming Wu, Dixin Tang
TL;DR
PathFinder tackles filtered approximate nearest neighbor search by enabling administrators to create attribute-specific ANNS indexes and by employing a cost-based optimizer to efficiently process complex filters. It introduces a novel search utility to balance graph density and plan count, a two-phase optimization for conjunctions and disjunctions, and an index-borrowing technique to exploit correlations between attributes. Empirical results on four real-world datasets show up to 9.8x throughput gains at recall $0.95$ over strong baselines, with modest optimization overhead and scalable index configurations. The work advances practical filtered ANNS in vector databases and suggests future directions in index compression and automated index recommendation.
Abstract
Filtered approximate nearest neighbor search (ANNS) restricts the search to data objects whose attributes satisfy a given filter and retrieves the top-$K$ objects that are most semantically similar to the query object. Many graph-based ANNS indexes are proposed to enable efficient filtered ANNS but remain limited in applicability or performance: indexes optimized for a specific attribute achieve high efficiency for filters on that attribute but fail to support complex filters with arbitrary conjunctions and disjunctions over multiple attributes. Inspired by the design of relational databases, this paper presents PathFinder, a new indexing framework that allows users to selectively create ANNS indexes optimized for filters on specific attributes and employs a cost-based optimizer to efficiently utilize them for processing complex filters. PathFinder includes three novel techniques: 1) a new optimization metric that captures the tradeoff between query execution time and accuracy, 2) a two-phase optimization for handling filters with conjunctions and disjunctions, and 3) an index borrowing optimization that uses an attribute-specific index to process filters on another attribute. Experiments on four real-world datasets show that PathFinder outperforms the best baseline by up to 9.8x in query throughput at recall 0.95.
