Table of Contents
Fetching ...

Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer

Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR

This work tackles learning-to-defer with multiple experts by introducing principled surrogate losses that come with strong theoretical guarantees, including realizable $\mathscr{H}$-consistency, $\mathscr{H}$-consistency bounds, and Bayes-consistency for both single-stage and two-stage settings. It develops a family of comp-sum surrogate losses $L_{\Psi}$ and margin-based surrogates $L_{\Phi}$, derives realizability under scaling, and provides enhanced low-noise bounds via Tsybakov-type assumptions. Theoretical results cover both single-stage and two-stage deferral, with explicit bounds that depend on the number of experts ${n_e}$ and costs, and experiments on standard vision datasets corroborate the theory while demonstrating practical deferral behavior and realizable-consistency in realizable and non-realizable regimes. The findings advance principled routing among experts and have practical implications for resource-constrained systems and large-language-model ensembles where selective delegation can improve reliability and efficiency.

Abstract

The problem of learning to defer with multiple experts consists of optimally assigning input instances to experts, balancing the trade-off between their accuracy and computational cost. This is a critical challenge in natural language generation, but also in other fields such as image processing, and medical diagnostics. Recent studies have proposed surrogate loss functions to optimize deferral, but challenges remain in ensuring their consistency properties. This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage (jointly learning predictor and deferral function) and two-stage (learning only the deferral function with a fixed expert) learning scenarios. For single-stage deferral, we introduce a family of new realizable $H$-consistent surrogate losses and further prove $H$-consistency for a selected member. For two-stage deferral, we derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario. Additionally, we provide enhanced theoretical guarantees under low-noise assumptions for both scenarios. Finally, we report the results of experiments using our proposed surrogate losses, comparing their performance against existing baselines.

Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer

TL;DR

This work tackles learning-to-defer with multiple experts by introducing principled surrogate losses that come with strong theoretical guarantees, including realizable -consistency, -consistency bounds, and Bayes-consistency for both single-stage and two-stage settings. It develops a family of comp-sum surrogate losses and margin-based surrogates , derives realizability under scaling, and provides enhanced low-noise bounds via Tsybakov-type assumptions. Theoretical results cover both single-stage and two-stage deferral, with explicit bounds that depend on the number of experts and costs, and experiments on standard vision datasets corroborate the theory while demonstrating practical deferral behavior and realizable-consistency in realizable and non-realizable regimes. The findings advance principled routing among experts and have practical implications for resource-constrained systems and large-language-model ensembles where selective delegation can improve reliability and efficiency.

Abstract

The problem of learning to defer with multiple experts consists of optimally assigning input instances to experts, balancing the trade-off between their accuracy and computational cost. This is a critical challenge in natural language generation, but also in other fields such as image processing, and medical diagnostics. Recent studies have proposed surrogate loss functions to optimize deferral, but challenges remain in ensuring their consistency properties. This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. We address open questions regarding realizable -consistency, -consistency bounds, and Bayes-consistency for both single-stage (jointly learning predictor and deferral function) and two-stage (learning only the deferral function with a fixed expert) learning scenarios. For single-stage deferral, we introduce a family of new realizable -consistent surrogate losses and further prove -consistency for a selected member. For two-stage deferral, we derive new surrogate losses that achieve realizable -consistency, -consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario. Additionally, we provide enhanced theoretical guarantees under low-noise assumptions for both scenarios. Finally, we report the results of experiments using our proposed surrogate losses, comparing their performance against existing baselines.

Paper Structure

This paper contains 39 sections, 30 theorems, 113 equations, 1 figure, 3 tables.

Key Result

Lemma 3.0

The deferral loss can be expressed as follows: $\forall (h, x, y) \in {\mathscr H}_{\rm{all}} \times {\mathscr X} \times {\mathscr Y}$,

Figures (1)

  • Figure 1: System Accuracy vs. Training Sample Size.

Theorems & Definitions (49)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Lemma 3.0
  • Theorem 3.1
  • Theorem 3.2: Theorem 4.1 in maorealizable
  • Theorem 3.3
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • ...and 39 more