Table of Contents
Fetching ...

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Kyoungmin Kim, Lennart Roth, Liang Liang, Anastasia Ailamaki

Abstract

Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and semantic analytics. Existing state-of-the-art approach exploits work sharing among similar queries but still suffers from redundant index traversals and excessive distance computations. We propose a unified framework for efficient approximate vector joins that (1) introduces soft work sharing to reuse traversal results beyond the join results of previous queries, (2) builds a merged index over both query and data vectors to further speedup graph explorations, and (3) improves robustness for out-of-distribution queries through an adaptive hybrid search strategy. Experiments on eight datasets demonstrate substantial improvements in efficiency-recall trade-off over the state of the art.

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Abstract

Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and semantic analytics. Existing state-of-the-art approach exploits work sharing among similar queries but still suffers from redundant index traversals and excessive distance computations. We propose a unified framework for efficient approximate vector joins that (1) introduces soft work sharing to reuse traversal results beyond the join results of previous queries, (2) builds a merged index over both query and data vectors to further speedup graph explorations, and (3) improves robustness for out-of-distribution queries through an adaptive hybrid search strategy. Experiments on eight datasets demonstrate substantial improvements in efficiency-recall trade-off over the state of the art.
Paper Structure (36 sections, 15 figures, 2 tables, 4 algorithms)

This paper contains 36 sections, 15 figures, 2 tables, 4 algorithms.

Figures (15)

  • Figure 1: Vector join procedure for a single query vector. The search starts from the starting point of a graph-based index. 1) Greedy search navigates closer to the query to find an in-range point. 2) Once found, breadth-first search (BFS) seeks for all reachable in-range points.
  • Figure 2: Vector join procedure for two queries, one in-distribution query (left) and one out-of-distribution query (right). Same legend with Figure \ref{['fig:search_phases']}.
  • Figure 3: Work sharing examples for two queries $q_1$ and $q_2$. Blue circles are data points ordered by the distances to queries. Dashed boxes represent cached results per query.
  • Figure 4: Vector join procedures for two types of indexes. Among multiple queries, the procedure for the bottom-right one is shown.
  • Figure 5: Illustration of a common pruning rule used in graph-based indexes RNG. $u$, $v$, and $w$ are vectors, and circles are centered at $u$ and $v$ with radius $dist(u, v)$. $u$ and $v$ are connected if and only if there is no $w$ in the shaded region such that $dist(u, w) < dist(u, v)$ and $dist(v, w) < dist(v, u)$.
  • ...and 10 more figures