Table of Contents
Fetching ...

ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data

Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia

TL;DR

ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy, and achieves state-of-the-art performance on all datasets, outperforming prior methods with 2--1,000× higher throughput at a fixed recall.

Abstract

Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2-1,000x higher throughput at a fixed recall.

ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data

TL;DR

ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy, and achieves state-of-the-art performance on all datasets, outperforming prior methods with 2--1,000× higher throughput at a fixed recall.

Abstract

Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2-1,000x higher throughput at a fixed recall.
Paper Structure (32 sections, 5 equations, 14 figures, 6 tables, 2 algorithms)

This paper contains 32 sections, 5 equations, 14 figures, 6 tables, 2 algorithms.

Figures (14)

  • Figure 1: Schematic drawing of search over an HNSW index. The search path is shown by blue arrows, beginning on level 2 and ending on level 0 at the query point, shown in green.
  • Figure 2: Schematic drawing of a dataset with no predicate clustering (top), a dataset with predicate clustering and positive query correlation (middle), and a dataset with predicate clustering and negative query correlation (bottom). Dark blue circles show points that pass the predicate, and light gray circles show points that fail the predicate. The query vectors are shown in green.
  • Figure 3: An illustration of the predicate subgraph, shown by the green nodes. ACORN searches over the predicate subraph to emulate search over an oracle partition index.
  • Figure 4: Diagram of ACORN's neighbor selection strategies. Blue nodes represent neighbors that pass the query predicate. Sub-figure (a) shows the simple predicate-based filter applied to uncompressed edge lists of size $M\cdot \gamma$, followed by truncation to size $M=3$. Sub-figure (b) shows the compression-based heuristic. Sub-figure (c) shows the neighbor expansion strategy used in ACORN-1.
  • Figure 5: A comparison of HNSW and ACORN-$\gamma$'s strategies for (a) selecting candidate edges, shown for $M$=3, and (b) pruning candidate edges for each inserted node $v$, shown for $M$=3, $M_{\beta}$=2, $\gamma$=2.
  • ...and 9 more figures