Principled Approaches for Learning to Defer with Multiple Experts

Anqi Mao; Mehryar Mohri; Yutao Zhong

Principled Approaches for Learning to Defer with Multiple Experts

Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR

This work advances learning to defer with multiple experts by formulating a general surrogate-loss framework that jointly learns predictions and deferrals. It establishes ${\mathscr H}$-consistency bounds for these surrogates, incorporating minimizability gaps to yield potentially tighter, non-asymptotic guarantees than traditional excess-error bounds. The theory encompasses a broad class of surrogate losses (comp-sum, sum, and constrained variants) and derives finite-sample learning bounds using Rademacher complexity, connecting estimation error to the deferral setup. Empirically, the method demonstrates improved accuracy on SVHN and CIFAR-10 as the number of available experts grows, validating the practical value of multi-expert deferral with principled loss designs.

Abstract

We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.

Principled Approaches for Learning to Defer with Multiple Experts

TL;DR

This work advances learning to defer with multiple experts by formulating a general surrogate-loss framework that jointly learns predictions and deferrals. It establishes

-consistency bounds for these surrogates, incorporating minimizability gaps to yield potentially tighter, non-asymptotic guarantees than traditional excess-error bounds. The theory encompasses a broad class of surrogate losses (comp-sum, sum, and constrained variants) and derives finite-sample learning bounds using Rademacher complexity, connecting estimation error to the deferral setup. Empirically, the method demonstrates improved accuracy on SVHN and CIFAR-10 as the number of available experts grows, validating the practical value of multi-expert deferral with principled loss designs.

Abstract

-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.

Paper Structure (33 sections, 8 theorems, 66 equations, 1 figure, 6 tables)

This paper contains 33 sections, 8 theorems, 66 equations, 1 figure, 6 tables.

Introduction
Preliminaries
General surrogate losses
${\mathscr H}$-consistency bounds for surrogate losses
Benefits of minimizability gaps
Learning bounds
Experiments
Conclusion
Related work
Experimental details
Experimental setup.
Additional experiments.
Proof of H-consistency bounds for deferral surrogate losses
Conditional regret of the deferral loss
Conditional regret of a surrogate deferral loss
...and 18 more sections

Key Result

theorem 1

Assume that $\ell$ admits an ${\mathscr H}$-consistency bound with respect to the multi-class zero-one classification loss $\ell_{0-1}$. Thus, there exists a non-decreasing concave function $\Gamma$ with $\Gamma(0)=0$ such that, for any distribution ${\mathscr D}$ and for all $h \in {\mathscr H}$, w Then, ${{\mathsf L}}$ admits the following ${\mathscr H}$-consistency bound with respect to ${{\mat

Figures (1)

Figure 1: Illustration of the scenario of learning to defer with multiple experts ($n=3$ and ${n_e}=2$).

Theorems & Definitions (12)

theorem 1: $\sH$-consistency bounds for score-based surrogates
corollary 1
theorem 2: Learning bound
lemma 1
proof
lemma 2
proof
lemma 3
theorem 2: $\sH$-consistency bounds for score-based surrogates
proof
...and 2 more

Principled Approaches for Learning to Defer with Multiple Experts

TL;DR

Abstract

Principled Approaches for Learning to Defer with Multiple Experts

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (12)