Two-stage Risk Control with Application to Ranked Retrieval

Yunpeng Xu; Mufang Ying; Wenge Guo; Zhi Wei

Two-stage Risk Control with Application to Ranked Retrieval

Yunpeng Xu, Mufang Ying, Wenge Guo, Zhi Wei

TL;DR

This work tackles uncertainty quantification and risk control in two-stage ranked retrieval systems (retrieval followed by ranking). It introduces two complementary frameworks—learn-then-test (LTT) and two-stage conformal risk control (CRC)—to jointly control the first-stage retrieval risk and the second-stage ranking risk, all while leveraging the sequential structure to reduce computation. The authors provide finite-sample and asymptotic guarantees, design task-specific loss functions (retrieval loss and a modified nDCG-based ranking loss), and validate the approach on large-scale datasets (MSLR-Web and Yahoo LTRC), showing effective risk management with competitive recall and precision. The methods are model-agnostic and extensible to non-monotone losses and to multiple stages, making them practical for deployment in real-world retrieval and ranking pipelines.

Abstract

Practical machine learning systems often operate in multiple sequential stages, as seen in ranking and recommendation systems, which typically include a retrieval phase followed by a ranking phase. Effectively assessing prediction uncertainty and ensuring effective risk control in such systems pose significant challenges due to their inherent complexity. To address these challenges, we developed two-stage risk control methods based on the recently proposed learn-then-test (LTT) and conformal risk control (CRC) frameworks. Unlike the methods in prior work that address multiple risks, our approach leverages the sequential nature of the problem, resulting in reduced computational burden. We provide theoretical guarantees for our proposed methods and design novel loss functions tailored for ranked retrieval tasks. The effectiveness of our approach is validated through experiments on two large-scale, widely-used datasets: MSLR-Web and Yahoo LTRC.

Two-stage Risk Control with Application to Ranked Retrieval

TL;DR

Abstract

Paper Structure (31 sections, 5 theorems, 58 equations, 3 figures, 2 tables)

This paper contains 31 sections, 5 theorems, 58 equations, 3 figures, 2 tables.

Introduction
Conformal prediction
Ranked retrieval
Problem setup
Data structure in ranked retrieval problem
Two-stage risk control
LTT framework
Procedure:
Expected risk control
Finite-sample second-stage risk control
Application to ranked retrieval
Loss function for retrieval stage
Loss function for ranking stage
Parameter pair selection via empirical set sizes minimization
Experiments
...and 16 more sections

Key Result

Theorem 1

Let $\mathcal{R}$ denote the collection of tuning parameter pairs returned from a FWER controlling algorithm testing $\mathcal{F}^{\hbox{(1)}} \cup \mathcal{F}^{\hbox{(2)}}$ at level $\delta$. Then, we have Therefore, we have

Figures (3)

Figure 1: Graphical ordering of hypothesis tests
Figure 2: MSLR dataset
Figure 3: Yahoo dataset

Theorems & Definitions (5)

Theorem 1
Theorem 2
Corollary 1
Theorem 3
Corollary 2

Two-stage Risk Control with Application to Ranked Retrieval

TL;DR

Abstract

Two-stage Risk Control with Application to Ranked Retrieval

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (5)