Two-stage Risk Control with Application to Ranked Retrieval
Yunpeng Xu, Mufang Ying, Wenge Guo, Zhi Wei
TL;DR
This work tackles uncertainty quantification and risk control in two-stage ranked retrieval systems (retrieval followed by ranking). It introduces two complementary frameworks—learn-then-test (LTT) and two-stage conformal risk control (CRC)—to jointly control the first-stage retrieval risk and the second-stage ranking risk, all while leveraging the sequential structure to reduce computation. The authors provide finite-sample and asymptotic guarantees, design task-specific loss functions (retrieval loss and a modified nDCG-based ranking loss), and validate the approach on large-scale datasets (MSLR-Web and Yahoo LTRC), showing effective risk management with competitive recall and precision. The methods are model-agnostic and extensible to non-monotone losses and to multiple stages, making them practical for deployment in real-world retrieval and ranking pipelines.
Abstract
Practical machine learning systems often operate in multiple sequential stages, as seen in ranking and recommendation systems, which typically include a retrieval phase followed by a ranking phase. Effectively assessing prediction uncertainty and ensuring effective risk control in such systems pose significant challenges due to their inherent complexity. To address these challenges, we developed two-stage risk control methods based on the recently proposed learn-then-test (LTT) and conformal risk control (CRC) frameworks. Unlike the methods in prior work that address multiple risks, our approach leverages the sequential nature of the problem, resulting in reduced computational burden. We provide theoretical guarantees for our proposed methods and design novel loss functions tailored for ranked retrieval tasks. The effectiveness of our approach is validated through experiments on two large-scale, widely-used datasets: MSLR-Web and Yahoo LTRC.
