Table of Contents
Fetching ...

Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms

Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR

The paper addresses multi-class abstention by developing a predictor-rejector framework with novel surrogate losses that come with strong non-asymptotic and realizable consistency guarantees. It establishes both single-stage and two-stage approaches, proving H-consistency bounds for three multiclass surrogates (mean absolute error, $ ho$-hinge, and $ ho$-margin) and showing realizable consistency under scaling-closed hypothesis sets. A central result is that score-based abstention can fail to recover Bayes optimal decisions in some settings, whereas the predictor-rejector approach yields Bayes-optimal solutions with tractable surrogate losses. Empirically, two-stage predictor-rejector methods outperform state-of-the-art score-based baselines on SVHN, CIFAR-10, and CIFAR-100, illustrating practical gains in abstention-aware classification. Overall, the work provides both theoretical guarantees and practical algorithms for robust multi-class abstention, addressing open questions in the literature and offering guidance for deploying abstention-aware models in real-world systems.

Abstract

We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees, thereby resolving positively two existing open questions. These guarantees provide upper bounds on the estimation error of the abstention loss function in terms of that of the surrogate loss. We analyze both a single-stage setting where the predictor and rejector are learned simultaneously and a two-stage setting crucial in applications, where the predictor is learned in a first stage using a standard surrogate loss such as cross-entropy. These guarantees suggest new multi-class abstention algorithms based on minimizing these surrogate losses. We also report the results of extensive experiments comparing these algorithms to the current state-of-the-art algorithms on CIFAR-10, CIFAR-100 and SVHN datasets. Our results demonstrate empirically the benefit of our new surrogate losses and show the remarkable performance of our broadly applicable two-stage abstention algorithm.

Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms

TL;DR

The paper addresses multi-class abstention by developing a predictor-rejector framework with novel surrogate losses that come with strong non-asymptotic and realizable consistency guarantees. It establishes both single-stage and two-stage approaches, proving H-consistency bounds for three multiclass surrogates (mean absolute error, -hinge, and -margin) and showing realizable consistency under scaling-closed hypothesis sets. A central result is that score-based abstention can fail to recover Bayes optimal decisions in some settings, whereas the predictor-rejector approach yields Bayes-optimal solutions with tractable surrogate losses. Empirically, two-stage predictor-rejector methods outperform state-of-the-art score-based baselines on SVHN, CIFAR-10, and CIFAR-100, illustrating practical gains in abstention-aware classification. Overall, the work provides both theoretical guarantees and practical algorithms for robust multi-class abstention, addressing open questions in the literature and offering guidance for deploying abstention-aware models in real-world systems.

Abstract

We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees, thereby resolving positively two existing open questions. These guarantees provide upper bounds on the estimation error of the abstention loss function in terms of that of the surrogate loss. We analyze both a single-stage setting where the predictor and rejector are learned simultaneously and a two-stage setting crucial in applications, where the predictor is learned in a first stage using a standard surrogate loss such as cross-entropy. These guarantees suggest new multi-class abstention algorithms based on minimizing these surrogate losses. We also report the results of extensive experiments comparing these algorithms to the current state-of-the-art algorithms on CIFAR-10, CIFAR-100 and SVHN datasets. Our results demonstrate empirically the benefit of our new surrogate losses and show the remarkable performance of our broadly applicable two-stage abstention algorithm.
Paper Structure (38 sections, 19 theorems, 96 equations, 1 figure, 2 tables)

This paper contains 38 sections, 19 theorems, 96 equations, 1 figure, 2 tables.

Key Result

Theorem 1

Assume that ${\mathscr H}$ is symmetric and complete, and that ${\mathscr R}$ is complete. If there exists $x \in {\mathscr X}$ such that $\inf_{h \in {\mathscr H}} \mathop{\mathrm{\mathbb{E}}}\limits_y[*]{\ell(h,X, y) \mid X = x}\neq \frac{\beta \Psi (*){1 - \max_{y\in {\mathscr Y}}p(x, y)}}{\alpha

Figures (1)

  • Figure 1: Counterexample for score-based abstention losses.

Theorems & Definitions (27)

  • Theorem 1: Negative result for single-stage surrogates
  • Theorem 2: $(\sH, \sR)$-consistency bounds for single-stage surrogates
  • Corollary 2: Excess error bounds for single-stage surrogates
  • Theorem 3: $\sR$-consistency bounds for second-stage surrogates
  • Corollary 4
  • Theorem 5: $(\sH, \sR)$-consistency bounds for two-stage approach
  • Corollary 6
  • Definition 7
  • Theorem 8
  • Corollary 9
  • ...and 17 more