Table of Contents
Fetching ...

On Calibration in Multi-Distribution Learning

Rajeev Verma, Volker Fischer, Eric Nalisnick

TL;DR

This paper analyzes calibration in multi-distribution learning (MDL), where a predictor must perform well across a set of distributions $\mathcal{Q}$. It shows that the Bayes optimal rule for MDL maximizes the loss's generalized entropy $H_{\ell}$, yielding a saddle-point solution with $h^*(x)=Q^*(y|x)$, but calibration gaps can vary across distributions, revealing a fundamental calibration-refinement trade-off even at optimality. The work connects calibration to decision-making under MDL via proper scoring losses and risk decomposition, and it discusses implications for distributional robustness (DRO) and fairness, including how calibration disparity complicates worst-case guarantees and may require post-processing or careful divergence choices. These insights highlight practical limits of MDL for robust and equitable decisions and point to future work on empirical validation, divergence design, and algorithmic remedies to reduce calibration disparities.

Abstract

Modern challenges of robustness, fairness, and decision-making in machine learning have led to the formulation of multi-distribution learning (MDL) frameworks in which a predictor is optimized across multiple distributions. We study the calibration properties of MDL to better understand how the predictor performs uniformly across the multiple distributions. Through classical results on decomposing proper scoring losses, we first derive the Bayes optimal rule for MDL, demonstrating that it maximizes the generalized entropy of the associated loss function. Our analysis reveals that while this approach ensures minimal worst-case loss, it can lead to non-uniform calibration errors across the multiple distributions and there is an inherent calibration-refinement trade-off, even at Bayes optimality. Our results highlight a critical limitation: despite the promise of MDL, one must use caution when designing predictors tailored to multiple distributions so as to minimize disparity.

On Calibration in Multi-Distribution Learning

TL;DR

This paper analyzes calibration in multi-distribution learning (MDL), where a predictor must perform well across a set of distributions . It shows that the Bayes optimal rule for MDL maximizes the loss's generalized entropy , yielding a saddle-point solution with , but calibration gaps can vary across distributions, revealing a fundamental calibration-refinement trade-off even at optimality. The work connects calibration to decision-making under MDL via proper scoring losses and risk decomposition, and it discusses implications for distributional robustness (DRO) and fairness, including how calibration disparity complicates worst-case guarantees and may require post-processing or careful divergence choices. These insights highlight practical limits of MDL for robust and equitable decisions and point to future work on empirical validation, divergence design, and algorithmic remedies to reduce calibration disparities.

Abstract

Modern challenges of robustness, fairness, and decision-making in machine learning have led to the formulation of multi-distribution learning (MDL) frameworks in which a predictor is optimized across multiple distributions. We study the calibration properties of MDL to better understand how the predictor performs uniformly across the multiple distributions. Through classical results on decomposing proper scoring losses, we first derive the Bayes optimal rule for MDL, demonstrating that it maximizes the generalized entropy of the associated loss function. Our analysis reveals that while this approach ensures minimal worst-case loss, it can lead to non-uniform calibration errors across the multiple distributions and there is an inherent calibration-refinement trade-off, even at Bayes optimality. Our results highlight a critical limitation: despite the promise of MDL, one must use caution when designing predictors tailored to multiple distributions so as to minimize disparity.

Paper Structure

This paper contains 31 sections, 9 theorems, 9 equations, 1 figure.

Key Result

Proposition 2.2

(Calibration and decision-making). let4allcalibrationblog. Given $\mathcal{X} \times \mathcal{Y}$ with a distribution $P$ on it, a (finite) action space $\mathcal{A}$ and an arbitrary cost function $c: \mathcal{Y} \times \mathcal{A} \rightarrow \mathbb{R}_{+}$, the decision-rule $\delta: h\left({\ma

Figures (1)

  • Figure 1: Calibration disparity intuition in MDL: For a forecast $h_{{\bm{x}}}$, the calibration error is defined by the (generalized) entropy function $H_{\ell}$ as a hyperplane at $h_{{\bm{x}}}$ evaluated at $Q\left({\textnormal{y}} \ \vert \ h_{{\bm{x}}}\right)$ minus $H_{\ell}\left(Q\left({\textnormal{y}} \ \vert \ h_{{\bm{x}}}\right)\right)$.

Theorems & Definitions (17)

  • Definition 2.1
  • Proposition 2.2
  • Definition 2.3
  • Definition 2.4
  • Lemma 2.5
  • Definition 3.1
  • Proposition 3.2
  • proof
  • Proposition 4.1
  • proof
  • ...and 7 more