Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Kyoungmin Kim; Lennart Roth; Liang Liang; Anastasia Ailamaki

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Kyoungmin Kim, Lennart Roth, Liang Liang, Anastasia Ailamaki

Abstract

Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and semantic analytics. Existing state-of-the-art approach exploits work sharing among similar queries but still suffers from redundant index traversals and excessive distance computations. We propose a unified framework for efficient approximate vector joins that (1) introduces soft work sharing to reuse traversal results beyond the join results of previous queries, (2) builds a merged index over both query and data vectors to further speedup graph explorations, and (3) improves robustness for out-of-distribution queries through an adaptive hybrid search strategy. Experiments on eight datasets demonstrate substantial improvements in efficiency-recall trade-off over the state of the art.

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Abstract

Paper Structure (36 sections, 15 figures, 2 tables, 4 algorithms)

This paper contains 36 sections, 15 figures, 2 tables, 4 algorithms.

Introduction
Background
Problem Definition
Vector Join Methods
Nested Loop Join (NLJ)
Index Nested Loop Join (INLJ)
INLJ with Work Sharing (WS)
Why Not Hash Joins for Vectors?
Vector Join Framework
Work Sharing and Offloading
Early Stopping (ES)
Hard Work Sharing (HWS)
Soft Work Sharing (SWS)
Work Offloading with Merged Index (MI)
Hybrid Search for Out-of-Distribution Queries
...and 21 more sections

Figures (15)

Figure 1: Vector join procedure for a single query vector. The search starts from the starting point of a graph-based index. 1) Greedy search navigates closer to the query to find an in-range point. 2) Once found, breadth-first search (BFS) seeks for all reachable in-range points.
Figure 2: Vector join procedure for two queries, one in-distribution query (left) and one out-of-distribution query (right). Same legend with Figure \ref{['fig:search_phases']}.
Figure 3: Work sharing examples for two queries $q_1$ and $q_2$. Blue circles are data points ordered by the distances to queries. Dashed boxes represent cached results per query.
Figure 4: Vector join procedures for two types of indexes. Among multiple queries, the procedure for the bottom-right one is shown.
Figure 5: Illustration of a common pruning rule used in graph-based indexes RNG. $u$, $v$, and $w$ are vectors, and circles are centered at $u$ and $v$ with radius $dist(u, v)$. $u$ and $v$ are connected if and only if there is no $w$ in the shaded region such that $dist(u, w) < dist(u, v)$ and $dist(v, w) < dist(v, u)$.
...and 10 more figures

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Abstract

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Authors

Abstract

Table of Contents

Figures (15)