Towards Robust Ranker for Text Retrieval
Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Guodong Long, Binxing Jiao, Daxin Jiang
TL;DR
This work tackles the robustness of text rankers in retrieval–rerank pipelines by addressing two key issues: label noise from strong retrievers and suboptimal negative sampling. It proposes R$^2$anker, a framework that leverages multiple retrievers as negative generators to create open-set and diverse hard negatives, guided by a joint adversarial-like training objective and open-set noise strategies. Empirical results on MS-Marco demonstrate state-of-the-art performance for BM25-reranking and full-ranking, and the method can distill into a competitive first-stage retriever, enabling efficient end-to-end improvements. Distribution analyses further show that diversity and distribution alignment of negatives are crucial for robust ranker training, underscoring the practical value of multi-generator negative sampling in large-scale IR.
Abstract
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker's robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker's negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.
