Table of Contents
Fetching ...

PathFinder: Efficiently Supporting Conjunctions and Disjunctions for Filtered Approximate Nearest Neighbor Search

Tianming Wu, Dixin Tang

TL;DR

PathFinder tackles filtered approximate nearest neighbor search by enabling administrators to create attribute-specific ANNS indexes and by employing a cost-based optimizer to efficiently process complex filters. It introduces a novel search utility to balance graph density and plan count, a two-phase optimization for conjunctions and disjunctions, and an index-borrowing technique to exploit correlations between attributes. Empirical results on four real-world datasets show up to 9.8x throughput gains at recall $0.95$ over strong baselines, with modest optimization overhead and scalable index configurations. The work advances practical filtered ANNS in vector databases and suggests future directions in index compression and automated index recommendation.

Abstract

Filtered approximate nearest neighbor search (ANNS) restricts the search to data objects whose attributes satisfy a given filter and retrieves the top-$K$ objects that are most semantically similar to the query object. Many graph-based ANNS indexes are proposed to enable efficient filtered ANNS but remain limited in applicability or performance: indexes optimized for a specific attribute achieve high efficiency for filters on that attribute but fail to support complex filters with arbitrary conjunctions and disjunctions over multiple attributes. Inspired by the design of relational databases, this paper presents PathFinder, a new indexing framework that allows users to selectively create ANNS indexes optimized for filters on specific attributes and employs a cost-based optimizer to efficiently utilize them for processing complex filters. PathFinder includes three novel techniques: 1) a new optimization metric that captures the tradeoff between query execution time and accuracy, 2) a two-phase optimization for handling filters with conjunctions and disjunctions, and 3) an index borrowing optimization that uses an attribute-specific index to process filters on another attribute. Experiments on four real-world datasets show that PathFinder outperforms the best baseline by up to 9.8x in query throughput at recall 0.95.

PathFinder: Efficiently Supporting Conjunctions and Disjunctions for Filtered Approximate Nearest Neighbor Search

TL;DR

PathFinder tackles filtered approximate nearest neighbor search by enabling administrators to create attribute-specific ANNS indexes and by employing a cost-based optimizer to efficiently process complex filters. It introduces a novel search utility to balance graph density and plan count, a two-phase optimization for conjunctions and disjunctions, and an index-borrowing technique to exploit correlations between attributes. Empirical results on four real-world datasets show up to 9.8x throughput gains at recall over strong baselines, with modest optimization overhead and scalable index configurations. The work advances practical filtered ANNS in vector databases and suggests future directions in index compression and automated index recommendation.

Abstract

Filtered approximate nearest neighbor search (ANNS) restricts the search to data objects whose attributes satisfy a given filter and retrieves the top- objects that are most semantically similar to the query object. Many graph-based ANNS indexes are proposed to enable efficient filtered ANNS but remain limited in applicability or performance: indexes optimized for a specific attribute achieve high efficiency for filters on that attribute but fail to support complex filters with arbitrary conjunctions and disjunctions over multiple attributes. Inspired by the design of relational databases, this paper presents PathFinder, a new indexing framework that allows users to selectively create ANNS indexes optimized for filters on specific attributes and employs a cost-based optimizer to efficiently utilize them for processing complex filters. PathFinder includes three novel techniques: 1) a new optimization metric that captures the tradeoff between query execution time and accuracy, 2) a two-phase optimization for handling filters with conjunctions and disjunctions, and 3) an index borrowing optimization that uses an attribute-specific index to process filters on another attribute. Experiments on four real-world datasets show that PathFinder outperforms the best baseline by up to 9.8x in query throughput at recall 0.95.

Paper Structure

This paper contains 18 sections, 1 equation, 19 figures, 3 tables, 2 algorithms.

Figures (19)

  • Figure 1: A tree-based graph index built on a numeric attribute. The attribute range is recursively partitioned, and for each tree node, a proximity graph is built over the data objects whose attribute values fall within the node's range.
  • Figure 2: An example illustrating best-first search on a proximity graph for a graph index
  • Figure 3: The workflow of PathFinder for processing an example query
  • Figure 4: A tree-based graph index and a hash-based graph index built on the "citation count" and "topic" attributes.
  • Figure 5: A predicate with three conjunctive clauses on two attributes; tree-based indexes are built for both attributes.
  • ...and 14 more figures