Table of Contents
Fetching ...

Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention

Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR

This work advances score-based learning with abstention by developing non-asymptotic, hypothesis-set–specific guarantees (H-consistency) that relate abstention loss to surrogate losses. It introduces a unified single-stage framework with cross-entropy–style surrogates ${\mathsf L}_{\mu}$ (covering logistic, generalized cross-entropy, and MAE), and a novel two-stage formulation that decouples the predictor and abstention decision, with provable realizable and Bayes-consistent properties. The analysis centers on minimizability gaps ${\mathscr M}_{\mathsf L}(\mathscr H)$ and a transformation between surrogate risks and abstention risks, enabling finite-sample guarantees via Rademacher complexity. Empirical results on CIFAR-10, CIFAR-100, and SVHN show the two-stage surrogates consistently outperform state-of-the-art cross-entropy abstention losses across datasets, while highlighting dataset-dependent performance of existing surrogates. Overall, the paper provides a principled framework for designing abstention algorithms with strong theoretical guarantees and practical impact.

Abstract

Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses, which upper-bound the estimation error of the abstention loss function in terms of the estimation error of the surrogate loss. Our bounds can help compare different score-based surrogates and guide the design of novel abstention algorithms by minimizing the proposed surrogate losses. We experimentally evaluate our new algorithms on CIFAR-10, CIFAR-100, and SVHN datasets and the practical significance of our new surrogate losses and two-stage abstention algorithms. Our results also show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets.

Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention

TL;DR

This work advances score-based learning with abstention by developing non-asymptotic, hypothesis-set–specific guarantees (H-consistency) that relate abstention loss to surrogate losses. It introduces a unified single-stage framework with cross-entropy–style surrogates (covering logistic, generalized cross-entropy, and MAE), and a novel two-stage formulation that decouples the predictor and abstention decision, with provable realizable and Bayes-consistent properties. The analysis centers on minimizability gaps and a transformation between surrogate risks and abstention risks, enabling finite-sample guarantees via Rademacher complexity. Empirical results on CIFAR-10, CIFAR-100, and SVHN show the two-stage surrogates consistently outperform state-of-the-art cross-entropy abstention losses across datasets, while highlighting dataset-dependent performance of existing surrogates. Overall, the paper provides a principled framework for designing abstention algorithms with strong theoretical guarantees and practical impact.

Abstract

Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses, which upper-bound the estimation error of the abstention loss function in terms of the estimation error of the surrogate loss. Our bounds can help compare different score-based surrogates and guide the design of novel abstention algorithms by minimizing the proposed surrogate losses. We experimentally evaluate our new algorithms on CIFAR-10, CIFAR-100, and SVHN datasets and the practical significance of our new surrogate losses and two-stage abstention algorithms. Our results also show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets.
Paper Structure (32 sections, 11 theorems, 59 equations, 1 table)

This paper contains 32 sections, 11 theorems, 59 equations, 1 table.

Key Result

theorem 1

Assume that ${\mathscr H}$ is symmetric and complete. Then, for any hypothesis $h \in {\mathscr H}$ and any distribution ${\mathscr D}$, the following inequality holds: where $\Gamma_{\mu}(t)=$

Theorems & Definitions (18)

  • theorem 1: $\sH$-consistency bounds for score-based surrogates
  • theorem 2: Characterization of minimizability gaps
  • theorem 3
  • theorem 4: $\sH$-consistency bounds for two-stage surrogates
  • lemma 1
  • proof
  • corollary 1
  • theorem 4: $\sH$-consistency bounds for score-based surrogates
  • proof
  • theorem 4: Characterization of minimizability gaps
  • ...and 8 more