Table of Contents
Fetching ...

Adversarial Robustness in One-Stage Learning-to-Defer

Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

TL;DR

This work introduces the first framework for adversarial robustness in one-stage Learning-to-Defer (L2D), addressing both classification and regression. It formalizes untargeted and targeted attacks that perturb inputs to influence both predictions and deferral decisions, and it proposes cost-sensitive, tractable surrogate losses with theoretical guarantees including Bayes-consistency, $\mathcal{H}$-consistency for classification, and $(\mathcal{R},\mathcal{F})$-consistency for regression. The authors also present smooth adversarial surrogates and regularized ERM algorithms (RERM-C for classification and RERM-R for regression) to achieve robust end-to-end L2D while preserving clean accuracy. Empirical results on CIFAR-10, DermaMNIST, Communities and Crime, and COIL 2000 demonstrate substantial robustness improvements under both untargeted and targeted attacks, with lower adversarial deferral losses and competitive or better clean performance. Overall, the paper provides both theoretical foundations and practical methods for deploying robust one-stage L2D systems in safety-critical settings where both predictions and deferrals must be reliable under adversarial perturbations.

Abstract

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

Adversarial Robustness in One-Stage Learning-to-Defer

TL;DR

This work introduces the first framework for adversarial robustness in one-stage Learning-to-Defer (L2D), addressing both classification and regression. It formalizes untargeted and targeted attacks that perturb inputs to influence both predictions and deferral decisions, and it proposes cost-sensitive, tractable surrogate losses with theoretical guarantees including Bayes-consistency, -consistency for classification, and -consistency for regression. The authors also present smooth adversarial surrogates and regularized ERM algorithms (RERM-C for classification and RERM-R for regression) to achieve robust end-to-end L2D while preserving clean accuracy. Empirical results on CIFAR-10, DermaMNIST, Communities and Crime, and COIL 2000 demonstrate substantial robustness improvements under both untargeted and targeted attacks, with lower adversarial deferral losses and competitive or better clean performance. Overall, the paper provides both theoretical foundations and practical methods for deploying robust one-stage L2D systems in safety-critical settings where both predictions and deferrals must be reliable under adversarial perturbations.

Abstract

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including , , and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

Paper Structure

This paper contains 51 sections, 16 theorems, 92 equations, 8 tables.

Key Result

Theorem 1

The surrogate $\Phi_{\text{def}}^{c, u}$ is $\mathcal{H}$-consistent with respect to $\ell_{\text{def}}^{c}$ if there exists a non-decreasing function $\Gamma^u:\mathbb{R}^+\to\mathbb{R}^+$ such that for every distribution $\mathcal{D}$,

Theorems & Definitions (27)

  • Theorem 1: $\mathcal{H}$-consistency bounds Awasthi_Mao_Mohri_Zhong_2022_multi
  • Definition 1: Untargeted Attack
  • Definition 1: Targeted Attack
  • Remark 2
  • Lemma 2: Adversarial True Deferral Loss for Classification
  • Definition 2: Adversarial Surrogate Deferral Loss for Classification
  • Theorem 3: $\mathcal{H}$-consistency bounds of $\widetilde{\Phi}^{c,u}_{\text{def}}$
  • Lemma 3: Smooth Adversarial Surrogate Losses
  • Definition 3: Smooth Adversarial Surrogate Deferral Loss for Classification
  • Corollary 3: Guarantees for $\widetilde{\Phi}^{c,u}_{\mathrm{def,s}}$
  • ...and 17 more