Table of Contents
Fetching ...

All-in-one Graph-based Indexing for Hybrid Search on GPUs

Zhonggen Li, Yougen Li, Yifan Zhu, Congcong Ge, Zhaoqiang Chen, Yunjun Gao

TL;DR

Allan-Poe tackles the challenge of achieving high end-to-end relevance, efficiency, and flexibility in hybrid search by proposing a unified, GPU-accelerated graph index. It builds a Unified Semantic Metric Space that fuses dense, sparse, and full-text representations with optional knowledge-graph reasoning, enabling arbitrary path combinations without index reconstruction. The approach combines a GPU-optimized construction pipeline (warp-level hybrid distance, RNG-IP pruning, keyword-edge recycling, and logical-edge augmentation) with a dynamic GPU query framework that loads heterogeneous edges on demand and integrates entity-level relations. Extensive experiments on six real-world datasets show Allan-Poe delivers superior throughput and competitive or superior accuracy while reducing storage overhead compared to state-of-the-art baselines, demonstrating practical impact for search, recommendations, and RAG workloads.

Abstract

Hybrid search has emerged as a promising paradigm to overcome the limitations of single-path retrieval, enhancing accuracy for applications like recommendations, information retrieval, and Retrieval-Augmented Generation. However, existing methods are constrained by a trilemma: they sacrifice flexibility for efficiency, suffer from accuracy degradation due to separate retrievals, or incur prohibitive storage overhead for flexible combinations of retrieval paths. This paper introduces Allan-Poe, a novel All-in-one graph index accelerated by GPUs for efficient hybrid search. We first analyze the limitations of existing retrieval paradigms and distill key design principles for an effective hybrid search index. Guided by these principles, we architect a unified graph-based index that flexibly integrates four retrieval paths-dense vector, sparse vector, full-text, and knowledge graph-within a single, cohesive structure. To enable efficient construction, we design a GPU-accelerated pipeline featuring a warp-level hybrid distance kernel, RNG-IP joint pruning, and keyword-aware neighbor recycling. For query processing, we introduce a dynamic fusion framework that supports any combination of retrieval paths and weights without index reconstruction, leveraging logical edges from the knowledge graph to resolve complex multi-hop queries. Extensive experiments on 6 real-world datasets demonstrate that Allan-Poe achieves superior end-to-end query accuracy and outperforms state-of-the-art methods by 1.5-186.4x in throughput, while significantly reducing storage overhead.

All-in-one Graph-based Indexing for Hybrid Search on GPUs

TL;DR

Allan-Poe tackles the challenge of achieving high end-to-end relevance, efficiency, and flexibility in hybrid search by proposing a unified, GPU-accelerated graph index. It builds a Unified Semantic Metric Space that fuses dense, sparse, and full-text representations with optional knowledge-graph reasoning, enabling arbitrary path combinations without index reconstruction. The approach combines a GPU-optimized construction pipeline (warp-level hybrid distance, RNG-IP pruning, keyword-edge recycling, and logical-edge augmentation) with a dynamic GPU query framework that loads heterogeneous edges on demand and integrates entity-level relations. Extensive experiments on six real-world datasets show Allan-Poe delivers superior throughput and competitive or superior accuracy while reducing storage overhead compared to state-of-the-art baselines, demonstrating practical impact for search, recommendations, and RAG workloads.

Abstract

Hybrid search has emerged as a promising paradigm to overcome the limitations of single-path retrieval, enhancing accuracy for applications like recommendations, information retrieval, and Retrieval-Augmented Generation. However, existing methods are constrained by a trilemma: they sacrifice flexibility for efficiency, suffer from accuracy degradation due to separate retrievals, or incur prohibitive storage overhead for flexible combinations of retrieval paths. This paper introduces Allan-Poe, a novel All-in-one graph index accelerated by GPUs for efficient hybrid search. We first analyze the limitations of existing retrieval paradigms and distill key design principles for an effective hybrid search index. Guided by these principles, we architect a unified graph-based index that flexibly integrates four retrieval paths-dense vector, sparse vector, full-text, and knowledge graph-within a single, cohesive structure. To enable efficient construction, we design a GPU-accelerated pipeline featuring a warp-level hybrid distance kernel, RNG-IP joint pruning, and keyword-aware neighbor recycling. For query processing, we introduce a dynamic fusion framework that supports any combination of retrieval paths and weights without index reconstruction, leveraging logical edges from the knowledge graph to resolve complex multi-hop queries. Extensive experiments on 6 real-world datasets demonstrate that Allan-Poe achieves superior end-to-end query accuracy and outperforms state-of-the-art methods by 1.5-186.4x in throughput, while significantly reducing storage overhead.

Paper Structure

This paper contains 38 sections, 1 theorem, 13 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Given the RNG-based index constructed from the fused vectors $f_\text{concat}(d)$ in UMUS, for any weight vector $w=[w_d,w_s,\\w_f]\in \mathbb{R}^3$ applied to query vectors $\{f_\text{dense}(q),f_\text{sparse}(q),f_\text{full}(q)\}$, the nearest neighbors can always be retrieved from the index.

Figures (13)

  • Figure 1: Comparison of existing hybrid search paradigms.
  • Figure 2: Gaps between the vector similarity and end-to-end document similarity of two graph-based indexes.
  • Figure 3: Comparison of various retrieval paths using Infinity infinity_hybrid. DVS, SVS, and FTS denote dense vector, sparse vector, and full-text search, respectively.
  • Figure 4: Example of retrieval in separate paths on NQ. The ground truth documents are doc$_2$ and doc$_4$.
  • Figure 5: Trilemma of existing retrieval methods.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Definition 1: Unified Semantic Metric Space - USMS
  • Theorem 1