Table of Contents
Fetching ...

MCGrad: Multicalibration at Web Scale

Niek Tax, Lorenzo Perini, Fridolin Linder, Daniel Haimovich, Dima Karamshuk, Nastaran Okati, Milan Vojnovic, Pavlos Athanasios Apostolopoulos

TL;DR

MCGrad tackles the challenge of achieving multicalibration at web scale by eliminating the need to manually specify protected groups and by leveraging multi-round gradient boosting with an augmented input that includes the base predictor’s outputs. The method uses efficient, low-latency training and a logit-rescaling step to maintain calibration without harming log loss or PRAUC, and it employs early stopping and Hessian-based leaf regularization to guard against overfitting. Empirically, MCGrad demonstrates strong multicalibration improvements on public benchmarks and achieves substantial, consistent gains in production settings at Meta across hundreds of models, with favorable latency characteristics for online deployment. The work provides practical guidance for industry adoption and links multicalibration to broader performance metrics, underscoring the real-world value of calibrated, subgroup-aware predictions.

Abstract

We propose MCGrad, a novel and scalable multicalibration algorithm. Multicalibration - calibration in subgroups of the data - is an important property for the performance of machine learning-based systems. Existing multicalibration methods have thus far received limited traction in industry. We argue that this is because existing methods (1) require such subgroups to be manually specified, which ML practitioners often struggle with, (2) are not scalable, or (3) may harm other notions of model performance such as log loss and Area Under the Precision-Recall Curve (PRAUC). MCGrad does not require explicit specification of protected groups, is scalable, and often improves other ML evaluation metrics instead of harming them. MCGrad has been in production at Meta, and is now part of hundreds of production models. We present results from these deployments as well as results on public datasets. We provide an open source implementation of MCGrad at https://github.com/facebookincubator/MCGrad.

MCGrad: Multicalibration at Web Scale

TL;DR

MCGrad tackles the challenge of achieving multicalibration at web scale by eliminating the need to manually specify protected groups and by leveraging multi-round gradient boosting with an augmented input that includes the base predictor’s outputs. The method uses efficient, low-latency training and a logit-rescaling step to maintain calibration without harming log loss or PRAUC, and it employs early stopping and Hessian-based leaf regularization to guard against overfitting. Empirically, MCGrad demonstrates strong multicalibration improvements on public benchmarks and achieves substantial, consistent gains in production settings at Meta across hundreds of models, with favorable latency characteristics for online deployment. The work provides practical guidance for industry adoption and links multicalibration to broader performance metrics, underscoring the real-world value of calibrated, subgroup-aware predictions.

Abstract

We propose MCGrad, a novel and scalable multicalibration algorithm. Multicalibration - calibration in subgroups of the data - is an important property for the performance of machine learning-based systems. Existing multicalibration methods have thus far received limited traction in industry. We argue that this is because existing methods (1) require such subgroups to be manually specified, which ML practitioners often struggle with, (2) are not scalable, or (3) may harm other notions of model performance such as log loss and Area Under the Precision-Recall Curve (PRAUC). MCGrad does not require explicit specification of protected groups, is scalable, and often improves other ML evaluation metrics instead of harming them. MCGrad has been in production at Meta, and is now part of hundreds of production models. We present results from these deployments as well as results on public datasets. We provide an open source implementation of MCGrad at https://github.com/facebookincubator/MCGrad.

Paper Structure

This paper contains 36 sections, 4 theorems, 31 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Proposition A.1

Given a set $\mathcal{H}$ of group membership functions, a probabilistic predictor $f$ is $\alpha$-multicalibrated with respect to $\mathcal{H}$, with the scale parameter $\tau_h(f)$, if and only if $\text{MCE}(f) \le \alpha \sqrt{n}$.

Figures (3)

  • Figure 1: Multicalibration Error computed using unspecified groups (left) or the manually prespecified groups (right) for all compared methods on each benchmark dataset. Overall, MCGrad achieves a better (lower) error for $10$ out of $11$ datasets when tested on unspecified groups, and for $5$ out of $11$ datasets when tested on prespecified groups.
  • Figure 2: Improvement (in %) of MCGrad's variants relative to the original version. While setting $T = 1$ yields a significant drop in performance, the effect of the rescaling factor and min sum Hessian in leaf are mild due to the datasets limited size.
  • Figure 3: Computational time of multicalibration algorithms with varying numbers of groups. (Left) Fit time and (Right) predict time in seconds as a function of the number of groups on a log-log scale. Runtime for DFMC and HKRR increases with the number of groups, while MCGrad is constant.

Theorems & Definitions (9)

  • definition 1: Multicalibration
  • Proposition A.1
  • proof
  • Proposition B.1
  • proof
  • Lemma B.1
  • proof
  • Proposition B.2
  • proof