All-in-one Graph-based Indexing for Hybrid Search on GPUs
Zhonggen Li, Yougen Li, Yifan Zhu, Congcong Ge, Zhaoqiang Chen, Yunjun Gao
TL;DR
Allan-Poe tackles the challenge of achieving high end-to-end relevance, efficiency, and flexibility in hybrid search by proposing a unified, GPU-accelerated graph index. It builds a Unified Semantic Metric Space that fuses dense, sparse, and full-text representations with optional knowledge-graph reasoning, enabling arbitrary path combinations without index reconstruction. The approach combines a GPU-optimized construction pipeline (warp-level hybrid distance, RNG-IP pruning, keyword-edge recycling, and logical-edge augmentation) with a dynamic GPU query framework that loads heterogeneous edges on demand and integrates entity-level relations. Extensive experiments on six real-world datasets show Allan-Poe delivers superior throughput and competitive or superior accuracy while reducing storage overhead compared to state-of-the-art baselines, demonstrating practical impact for search, recommendations, and RAG workloads.
Abstract
Hybrid search has emerged as a promising paradigm to overcome the limitations of single-path retrieval, enhancing accuracy for applications like recommendations, information retrieval, and Retrieval-Augmented Generation. However, existing methods are constrained by a trilemma: they sacrifice flexibility for efficiency, suffer from accuracy degradation due to separate retrievals, or incur prohibitive storage overhead for flexible combinations of retrieval paths. This paper introduces Allan-Poe, a novel All-in-one graph index accelerated by GPUs for efficient hybrid search. We first analyze the limitations of existing retrieval paradigms and distill key design principles for an effective hybrid search index. Guided by these principles, we architect a unified graph-based index that flexibly integrates four retrieval paths-dense vector, sparse vector, full-text, and knowledge graph-within a single, cohesive structure. To enable efficient construction, we design a GPU-accelerated pipeline featuring a warp-level hybrid distance kernel, RNG-IP joint pruning, and keyword-aware neighbor recycling. For query processing, we introduce a dynamic fusion framework that supports any combination of retrieval paths and weights without index reconstruction, leveraging logical edges from the knowledge graph to resolve complex multi-hop queries. Extensive experiments on 6 real-world datasets demonstrate that Allan-Poe achieves superior end-to-end query accuracy and outperforms state-of-the-art methods by 1.5-186.4x in throughput, while significantly reducing storage overhead.
