Table of Contents
Fetching ...

Principled Approaches for Learning to Defer with Multiple Experts

Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR

This work advances learning to defer with multiple experts by formulating a general surrogate-loss framework that jointly learns predictions and deferrals. It establishes ${\mathscr H}$-consistency bounds for these surrogates, incorporating minimizability gaps to yield potentially tighter, non-asymptotic guarantees than traditional excess-error bounds. The theory encompasses a broad class of surrogate losses (comp-sum, sum, and constrained variants) and derives finite-sample learning bounds using Rademacher complexity, connecting estimation error to the deferral setup. Empirically, the method demonstrates improved accuracy on SVHN and CIFAR-10 as the number of available experts grows, validating the practical value of multi-expert deferral with principled loss designs.

Abstract

We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.

Principled Approaches for Learning to Defer with Multiple Experts

TL;DR

This work advances learning to defer with multiple experts by formulating a general surrogate-loss framework that jointly learns predictions and deferrals. It establishes -consistency bounds for these surrogates, incorporating minimizability gaps to yield potentially tighter, non-asymptotic guarantees than traditional excess-error bounds. The theory encompasses a broad class of surrogate losses (comp-sum, sum, and constrained variants) and derives finite-sample learning bounds using Rademacher complexity, connecting estimation error to the deferral setup. Empirically, the method demonstrates improved accuracy on SVHN and CIFAR-10 as the number of available experts grows, validating the practical value of multi-expert deferral with principled loss designs.

Abstract

We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong -consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.
Paper Structure (33 sections, 8 theorems, 66 equations, 1 figure, 6 tables)

This paper contains 33 sections, 8 theorems, 66 equations, 1 figure, 6 tables.

Key Result

theorem 1

Assume that $\ell$ admits an ${\mathscr H}$-consistency bound with respect to the multi-class zero-one classification loss $\ell_{0-1}$. Thus, there exists a non-decreasing concave function $\Gamma$ with $\Gamma(0)=0$ such that, for any distribution ${\mathscr D}$ and for all $h \in {\mathscr H}$, w Then, ${{\mathsf L}}$ admits the following ${\mathscr H}$-consistency bound with respect to ${{\mat

Figures (1)

  • Figure 1: Illustration of the scenario of learning to defer with multiple experts ($n=3$ and ${n_e}=2$).

Theorems & Definitions (12)

  • theorem 1: $\sH$-consistency bounds for score-based surrogates
  • corollary 1
  • theorem 2: Learning bound
  • lemma 1
  • proof
  • lemma 2
  • proof
  • lemma 3
  • theorem 2: $\sH$-consistency bounds for score-based surrogates
  • proof
  • ...and 2 more