Table of Contents
Fetching ...

Analyzing Cost-Sensitive Surrogate Losses via $\mathcal{H}$-calibration

Sanket Shah, Milind Tambe, Jessie Finocchiaro

TL;DR

The paper addresses whether training with cost-sensitive surrogates yields better task performance than cost-agnostic surrogates with post processing. Using the framework of $\mathcal{H}$-calibration, it shows that cross entropy can fail to be $\mathcal{H}$-consistent for cost sensitive targets, while specially designed Embeddings surrogates achieve $\mathcal{H}$-consistency under $P$-minimizable conditions. Theoretical results are complemented by experiments on synthetic data and UCI cost sensitive tasks, where Embeddings and other cost sensitive surrogates consistently outperform cost-agnostic approaches with thresholding. The findings provide both a theoretical rationale and practical guidance for adopting cost sensitive surrogate losses in small model regimes and indicate promising directions for decision focused learning and structured prediction. The work highlights the practical impact of surrogate design in reducing misclassification costs in real world settings while acknowledging distributional limitations and areas for future research.

Abstract

This paper aims to understand whether machine learning models should be trained using cost-sensitive surrogates or cost-agnostic ones (e.g., cross-entropy). Analyzing this question through the lens of $\mathcal{H}$-calibration, we find that cost-sensitive surrogates can strictly outperform their cost-agnostic counterparts when learning small models under common distributional assumptions. Since these distributional assumptions are hard to verify in practice, we also show that cost-sensitive surrogates consistently outperform cost-agnostic surrogates on classification datasets from the UCI repository. Together, these make a strong case for using cost-sensitive surrogates in practice.

Analyzing Cost-Sensitive Surrogate Losses via $\mathcal{H}$-calibration

TL;DR

The paper addresses whether training with cost-sensitive surrogates yields better task performance than cost-agnostic surrogates with post processing. Using the framework of -calibration, it shows that cross entropy can fail to be -consistent for cost sensitive targets, while specially designed Embeddings surrogates achieve -consistency under -minimizable conditions. Theoretical results are complemented by experiments on synthetic data and UCI cost sensitive tasks, where Embeddings and other cost sensitive surrogates consistently outperform cost-agnostic approaches with thresholding. The findings provide both a theoretical rationale and practical guidance for adopting cost sensitive surrogate losses in small model regimes and indicate promising directions for decision focused learning and structured prediction. The work highlights the practical impact of surrogate design in reducing misclassification costs in real world settings while acknowledging distributional limitations and areas for future research.

Abstract

This paper aims to understand whether machine learning models should be trained using cost-sensitive surrogates or cost-agnostic ones (e.g., cross-entropy). Analyzing this question through the lens of -calibration, we find that cost-sensitive surrogates can strictly outperform their cost-agnostic counterparts when learning small models under common distributional assumptions. Since these distributional assumptions are hard to verify in practice, we also show that cost-sensitive surrogates consistently outperform cost-agnostic surrogates on classification datasets from the UCI repository. Together, these make a strong case for using cost-sensitive surrogates in practice.

Paper Structure

This paper contains 17 sections, 2 theorems, 12 equations, 1 figure, 5 tables.

Key Result

Theorem 5

Given a surrogate loss $L : \mathbb{R}^d \times \mathcal{Y} \to \mathbb{R}_+$, link function $\psi: \mathbb{R}^d \to \mathcal{R}$, and target loss $\ell : \mathcal{R} \times \mathcal{Y}\to \mathbb{R}_+$, assume that $P \in \mathcal{Q}_{L, \mathcal{H}} \cap \mathcal{Q}_{\ell,\mathcal{H}}$. Furthermor for all $x \in \mathcal{X}$, $\epsilon > 0$, and $h \in \mathcal{H}$. Then for all $\epsilon > 0$,

Figures (1)

  • Figure 1: A simple cost-sensitive binary classification task in which neither the cost-agnostic classifier (magenta dashed line $\psi \circ h^{ag}$) nor its post-processed counterpart (purple dot-dashed classifier $\psi_{\tau^*} \circ h^{ag}$) yield the optimal model. Instead, we need to incorporate information about the cost matrix into your training algorithm, e.g., by using a cost-sensitive loss function (orange dotted classifier $h^{cs}$ for $\alpha = \frac{1}{4}$). Data distribution given in Example \ref{['ex:binary-csc']}.

Theorems & Definitions (8)

  • Definition 1: $\mathcal{H}$-consistency
  • Definition 2: $P$-minimizability
  • Definition 3: Conditional risk
  • Definition 4: $\mathcal{H}$-calibration
  • Theorem 5: steinwart_how_2007
  • Example 1
  • Definition 6: finocchiaro_embedding_2024
  • Corollary 7: Embeddings are $\mathcal{H}$-consistent