A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation
Tian-Yi Che, Xian-Ling Mao, Chun Xu, Cheng-Xin Xin, Heng-Da Xu, Jin-Yu Liu, Heyan Huang
TL;DR
The paper addresses the fragmentation of retrieval performance across queries and corpora by proposing a Distributed Collaborative Retrieval Framework (DCRF) that unifies sparse, dense, and LLM-based retrievers. It introduces rank-oriented, zero-shot evaluation via four prompting strategies to select the best ranked results without labeled data, enabling flexible, scalable integration of models. Through extensive experiments on BEIR and TREC datasets with multiple open-source and black-box LLMs, DCRF achieves competitive or superior performance and improved efficiency compared to existing methods like RankGPT and ListT5. The framework offers practical impact by reducing maintenance costs, enabling domain adaptation, and providing a robust baseline for future multi-model retrieval systems and rank-aware evaluations.
Abstract
Numerous retrieval models, including sparse, dense and llm-based methods, have demonstrated remarkable performance in predicting the relevance between queries and corpora. However, the preliminary effectiveness analysis experiments indicate that these models fail to achieve satisfactory performance on the majority of queries and corpora, revealing their effectiveness restricted to specific scenarios. Thus, to tackle this problem, we propose a novel Distributed Collaborative Retrieval Framework (DCRF), outperforming each single model across all queries and corpora. Specifically, the framework integrates various retrieval models into a unified system and dynamically selects the optimal results for each user's query. It can easily aggregate any retrieval model and expand to any application scenarios, illustrating its flexibility and scalability.Moreover, to reduce maintenance and training costs, we design four effective prompting strategies with large language models (LLMs) to evaluate the quality of ranks without reliance of labeled data. Extensive experiments demonstrate that proposed framework, combined with 8 efficient retrieval models, can achieve performance comparable to effective listwise methods like RankGPT and ListT5, while offering superior efficiency. Besides, DCRF surpasses all selected retrieval models on the most datasets, indicating the effectiveness of our prompting strategies on rank-oriented automatic evaluation.
