Table of Contents
Fetching ...

DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation Graph

Ziqi Yin, Jianyang Gao, Pasquale Balsebre, Gao Cong, Cheng Long

TL;DR

This work tackles Hybrid Vector Query (HVQ), where each object is described by two vectors and a query-dependent weight α, challenging traditional ANNS indexes that fix α during construction. The authors introduce Dynamic Edge Navigation Graph (DEG), a graph-based ANNS index built with three innovations: Greedy Pareto Frontier Search (GPS) to obtain a candidate neighbor set that covers all α values, a dynamic edge pruning strategy that assigns active ranges to edges to preserve RNG-based pruning at query time, and an edge seed method to accelerate search. Empirical results on five real-world datasets show that DEG consistently outperforms fixed-α baselines across varying α, approaching the oracle performance in many settings, and scales to large datasets where competing methods struggle. The approach enables robust HVQ retrieval in bimodal data applications and opens avenues for extending to multi-vector queries and disk-based vector databases.

Abstract

Bimodal data, such as image-text pairs, has become increasingly prevalent in the digital era. The Hybrid Vector Query (HVQ) is an effective approach for querying such data and has recently garnered considerable attention from researchers. It calculates similarity scores for objects represented by two vectors using a weighted sum of each individual vector's similarity, with a query-specific parameter $α$ to determine the weight. Existing methods for HVQ typically construct Approximate Nearest Neighbors Search (ANNS) indexes with a fixed $α$ value. This leads to significant performance degradation when the query's $α$ dynamically changes based on the different scenarios and needs. In this study, we introduce the Dynamic Edge Navigation Graph (DEG), a graph-based ANNS index that maintains efficiency and accuracy with changing $α$ values. It includes three novel components: (1) a greedy Pareto frontier search algorithm to compute a candidate neighbor set for each node, which comprises the node's approximate nearest neighbors for all possible $α$ values; (2) a dynamic edge pruning strategy to determine the final edges from the candidate set and assign each edge an active range. This active range enables the dynamic use of the Relative Neighborhood Graph's pruning strategy based on the query's $α$ values, skipping redundant edges at query time and achieving a better accuracy-efficiency trade-off; and (3) an edge seed method that accelerates the querying process. Extensive experiments on real-world datasets show that DEG demonstrates superior performance compared to existing methods under varying $α$ values.

DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation Graph

TL;DR

This work tackles Hybrid Vector Query (HVQ), where each object is described by two vectors and a query-dependent weight α, challenging traditional ANNS indexes that fix α during construction. The authors introduce Dynamic Edge Navigation Graph (DEG), a graph-based ANNS index built with three innovations: Greedy Pareto Frontier Search (GPS) to obtain a candidate neighbor set that covers all α values, a dynamic edge pruning strategy that assigns active ranges to edges to preserve RNG-based pruning at query time, and an edge seed method to accelerate search. Empirical results on five real-world datasets show that DEG consistently outperforms fixed-α baselines across varying α, approaching the oracle performance in many settings, and scales to large datasets where competing methods struggle. The approach enables robust HVQ retrieval in bimodal data applications and opens avenues for extending to multi-vector queries and disk-based vector databases.

Abstract

Bimodal data, such as image-text pairs, has become increasingly prevalent in the digital era. The Hybrid Vector Query (HVQ) is an effective approach for querying such data and has recently garnered considerable attention from researchers. It calculates similarity scores for objects represented by two vectors using a weighted sum of each individual vector's similarity, with a query-specific parameter to determine the weight. Existing methods for HVQ typically construct Approximate Nearest Neighbors Search (ANNS) indexes with a fixed value. This leads to significant performance degradation when the query's dynamically changes based on the different scenarios and needs. In this study, we introduce the Dynamic Edge Navigation Graph (DEG), a graph-based ANNS index that maintains efficiency and accuracy with changing values. It includes three novel components: (1) a greedy Pareto frontier search algorithm to compute a candidate neighbor set for each node, which comprises the node's approximate nearest neighbors for all possible values; (2) a dynamic edge pruning strategy to determine the final edges from the candidate set and assign each edge an active range. This active range enables the dynamic use of the Relative Neighborhood Graph's pruning strategy based on the query's values, skipping redundant edges at query time and achieving a better accuracy-efficiency trade-off; and (3) an edge seed method that accelerates the querying process. Extensive experiments on real-world datasets show that DEG demonstrates superior performance compared to existing methods under varying values.

Paper Structure

This paper contains 31 sections, 3 theorems, 5 equations, 20 figures, 5 tables, 4 algorithms.

Key Result

theorem 1

We denote the Pareto Frontier of the multi-objective function $f(p, x)$ as $PF(D, p) \subset D\setminus\{p\}$. For any $\alpha \in [0,1]$, the nearest neighbor of $p$ is contained in $PF(D, p)$.

Figures (20)

  • Figure 1: Figure \ref{['fig:hnsw']} illustrates the HNSW index. Figure \ref{['fig:rng']} illustrates the pruning strategy of the Relative Neighborhood Graph (RNG).
  • Figure 2: The experiment results of HNSW$_{\textsf{M}}$ and HNSW$_{\textsf{F}}$ on the OpenImage dataset (up and right is better).
  • Figure 3: The framework of the DEG, including the index construction phase and index search phase.
  • Figure 4: Figure \ref{['fig:pareto']} illustrates the Pareto frontiers and Figure \ref{['fig:seed']} illustrates the difference between the edge seed acquisition method and the multiple centroids method.
  • Figure 5: The accuracy-efficiency trade-off results (upper and right is better).
  • ...and 15 more figures

Theorems & Definitions (3)

  • theorem 1
  • lemma 1
  • lemma 2