On Computationally Efficient Multi-Class Calibration

Parikshit Gopalan; Lunjia Hu; Guy N. Rothblum

On Computationally Efficient Multi-Class Calibration

Parikshit Gopalan, Lunjia Hu, Guy N. Rothblum

TL;DR

The paper addresses multiclass calibration under computational and sample-efficiency constraints, introducing projected smooth calibration as a robust, expressive yet tractable framework. It shows that auditing predictors to achieve this form of calibration can be done in time polynomial in $k$ by connecting auditing to binary agnostic learning and leveraging kernel methods, while also demonstrating that stronger notions (e.g., full smooth calibration) incur exponential sample complexity. A tight equivalence between auditing and agnostic learning enables transfer of algorithmic and hardness results, yielding both efficient procedures for projected smooth and sigmoid calibrations and hardness results for decision calibration and halfspace-like tasks. The work also places a clear boundary between expressivity and efficiency, with kernel-based auditors providing practical tools for downstream binary decisions derived from multiclass predictions, and explicit lower bounds clarifying the limits of feasibility. Overall, the framework advances robust, downstream-calibration guarantees for multiclass predictions with practical, scalable auditing and post-processing capabilities.

Abstract

Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.

On Computationally Efficient Multi-Class Calibration

TL;DR

by connecting auditing to binary agnostic learning and leveraging kernel methods, while also demonstrating that stronger notions (e.g., full smooth calibration) incur exponential sample complexity. A tight equivalence between auditing and agnostic learning enables transfer of algorithmic and hardness results, yielding both efficient procedures for projected smooth and sigmoid calibrations and hardness results for decision calibration and halfspace-like tasks. The work also places a clear boundary between expressivity and efficiency, with kernel-based auditors providing practical tools for downstream binary decisions derived from multiclass predictions, and explicit lower bounds clarifying the limits of feasibility. Overall, the framework advances robust, downstream-calibration guarantees for multiclass predictions with practical, scalable auditing and post-processing capabilities.

Abstract

Consider a multi-class labelling problem, where the labels can take values in

, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in

? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in

, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in

. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset

: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in

are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting.

Paper Structure (31 sections, 44 theorems, 136 equations, 2 algorithms)

This paper contains 31 sections, 44 theorems, 136 equations, 2 algorithms.

Introduction
Multi-class calibration.
Our Contributions
Weighted calibration.
Projected smooth calibration.
Lower bounds for stronger notions.
Equivalence between auditing and agnostic learning.
Further Discussion of Related Work
Organization
Multi-Class Calibration
Canonical Calibration.
Weighted Calibration.
Class-wise, Confidence, and Top-label Calibration.
Decision Calibration.
Smooth Calibration.
...and 16 more sections

Key Result

Theorem 1.2

There is an algorithm for deciding whether the projected smooth calibration error is at most $\alpha$, with sample complexity and running time $O(k^{O(1/\alpha)})$.

Theorems & Definitions (88)

Definition 1.1: Projected smooth calibration, informal statement of \ref{['def:p-smooth']}
Theorem 1.2: Efficient auditing, informal statement of \ref{['thm:p-smooth']}
Theorem 1.3: Informal statement of \ref{['thm:hard-psmooth']}
Theorem 1.4: Informal statement of \ref{['thm:full']}
Theorem 1.5: Informal statement of \ref{['thm:canonical']}
Theorem 1.6: Informal statement of \ref{['thm:red', 'thm:product-hardness']}
Definition 2.1
Definition 2.2: Weighted calibration GopalanKSZ22
Definition 2.3: Decision Calibration zhao2021calibrating
Definition 2.4: Subset Smooth Calibration
...and 78 more

On Computationally Efficient Multi-Class Calibration

TL;DR

Abstract

On Computationally Efficient Multi-Class Calibration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (88)