Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism
Hippolyte Gisserot-Boukhlef, Manuel Faysse, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo
TL;DR
This work tackles trustworthy neural information retrieval by enabling abstention in the reranking stage under black-box constraints. It introduces two confidence-estimation paradigms: a reference-free approach using simple score statistics and a data-driven approach using a reference set to calibrate a regression-based confidence function, with a threshold-based abstention decision. The standout result is that a reference-based linear regression confidence ($u_{\text{lin}}$) consistently outperforms reference-free baselines across six multilingual datasets, achieving higher $nAUC_m$ while incurring negligible overhead ($\approx 1.2\%$ of relevance-score computation time). Domain transfer experiments show calibration depends on distributional similarity and the number of positives per instance, but small calibration sets can suffice. Overall, the proposed abstention mechanism offers a practical, training-light method to enhance trustworthiness and efficiency of IR pipelines such as RAG and search systems.
Abstract
Neural Information Retrieval (NIR) has significantly improved upon heuristic-based Information Retrieval (IR) systems. Yet, failures remain frequent, the models used often being unable to retrieve documents relevant to the user's query. We address this challenge by proposing a lightweight abstention mechanism tailored for real-world constraints, with particular emphasis placed on the reranking phase. We introduce a protocol for evaluating abstention strategies in black-box scenarios (typically encountered when relying on API services), demonstrating their efficacy, and propose a simple yet effective data-driven mechanism. We provide open-source code for experiment replication and abstention implementation, fostering wider adoption and application in diverse contexts.
