Balancing the Blend: An Experimental Analysis of Trade-offs in Hybrid Search
Mengzhao Wang, Boyu Tan, Yunjun Gao, Hai Jin, Yingfeng Zhang, Xiangyu Ke, Xiaoliang Xu, Yifan Zhu
TL;DR
This work delivers the first systematic, experimental analysis of advanced hybrid search architectures by jointly evaluating four retrieval paradigms (FTS, SVS, DVS, TenS), their combinations, and re-ranking strategies across 11 real-world datasets. It introduces a modular evaluation framework inspired by the Infinity database, enabling fair, scalable comparisons and yielding three core findings: a pervasive 'weakest link' effect that constrains hybrid accuracy, a data-driven map of performance trade-offs that argues against one-size-fits-all configurations, and the emergence of Tensor-based Re-ranking Fusion (TRF) as a practical, high-efficacy re-ranking approach. The study provides concrete guidelines for adaptive, resource-aware hybrid search design and highlights directions for reducing tensor-based costs, adaptive path selection, and end-to-end evaluation within RAG pipelines. Together, these contributions offer a rigorous foundation for building efficient, accurate hybrid search systems in real-world deployments.
Abstract
Hybrid search, the integration of lexical and semantic retrieval, has become a cornerstone of modern information retrieval systems, driven by demanding applications like Retrieval-Augmented Generation (RAG). The architectural design space for these systems is vast and complex, yet a systematic understanding of the trade-offs among their core components -- retrieval paradigms, combination schemes, and re-ranking methods -- is lacking. To address this, and informed by our experience building the Infinity open-source database, we present the first experimental analysis of advanced hybrid search architectures. Our framework integrates four retrieval paradigms -- Full-Text Search (FTS), Sparse Vector Search (SVS), Dense Vector Search (DVS), and Tensor Search (TenS) -- and evaluates their combinations and re-ranking strategies across 11 real-world datasets. Our results reveal three key findings: (1) A "weakest link" phenomenon, where a weak path can substantially degrade overall accuracy, highlighting the need for path-wise quality assessment before fusion. (2) A data-driven map of performance trade-offs, demonstrating that optimal configurations depend heavily on resource constraints and data characteristics, precluding a one-size-fits-all solution. (3) The identification of Tensor-based Re-ranking Fusion (TRF) as a high-efficacy alternative to mainstream fusion methods, offering the semantic power of tensor search at a fraction of the computational and memory cost. Our findings offer concrete guidelines for designing adaptive, scalable hybrid search systems and identify key directions for future research.
