Table of Contents
Fetching ...

High-Dimensional Calibration from Swap Regret

Maxwell Fishelson, Noah Golowich, Mehryar Mohri, Jon Schneider

TL;DR

This work studies online calibration of high-dimensional forecasts over a convex set P and reveals a tight connection to online linear optimization via swap regret. By reframing calibration as a swap-regret minimization problem and instantiating a TreeSwap-based approach, the authors introduce TreeCal, a universal calibration algorithm that attains Cal_T^{||·||^2} ≤ ε T once T is large enough, with bounds scaling as (diam(P)/√ε)^{O(Rate(P, ||·||)/ε). The analysis shows that the optimal regularizer for online linear optimization governs the calibration error, enabling simultaneous guarantees for any norm and any convex P without relying on external OLO subroutines. For special cases, such as the simplex with l1 norm, the results recover and generalize existing bounds to yield ε-calibration in d^{O(1/ε^2)} rounds, while a lower bound demonstrates the necessity of exponential dependence on 1/ε in general. Overall, the paper bridges calibration and OLO theory, providing a unifying, norm-agnostic framework with strong high-dimensional guarantees and clear limitations.

Abstract

We study the online calibration of multi-dimensional forecasts over an arbitrary convex set $\mathcal{P} \subset \mathbb{R}^d$ relative to an arbitrary norm $\Vert\cdot\Vert$. We connect this with the problem of external regret minimization for online linear optimization, showing that if it is possible to guarantee $O(\sqrt{ρT})$ worst-case regret after $T$ rounds when actions are drawn from $\mathcal{P}$ and losses are drawn from the dual $\Vert \cdot \Vert_*$ unit norm ball, then it is also possible to obtain $ε$-calibrated forecasts after $T = \exp(O(ρ/ε^2))$ rounds. When $\mathcal{P}$ is the $d$-dimensional simplex and $\Vert \cdot \Vert$ is the $\ell_1$-norm, the existence of $O(\sqrt{T\log d})$-regret algorithms for learning with experts implies that it is possible to obtain $ε$-calibrated forecasts after $T = \exp(O(\log{d}/ε^2)) = d^{O(1/ε^2)}$ rounds, recovering a recent result of Peng (2025). Interestingly, our algorithm obtains this guarantee without requiring access to any online linear optimization subroutine or knowledge of the optimal rate $ρ$ -- in fact, our algorithm is identical for every setting of $\mathcal{P}$ and $\Vert \cdot \Vert$. Instead, we show that the optimal regularizer for the above OLO problem can be used to upper bound the above calibration error by a swap regret, which we then minimize by running the recent TreeSwap algorithm with Follow-The-Leader as a subroutine. Finally, we prove that any online calibration algorithm that guarantees $εT$ $\ell_1$-calibration error over the $d$-dimensional simplex requires $T \geq \exp(\mathrm{poly}(1/ε))$ (assuming $d \geq \mathrm{poly}(1/ε)$). This strengthens the corresponding $d^{Ω(\log{1/ε})}$ lower bound of Peng, and shows that an exponential dependence on $1/ε$ is necessary.

High-Dimensional Calibration from Swap Regret

TL;DR

This work studies online calibration of high-dimensional forecasts over a convex set P and reveals a tight connection to online linear optimization via swap regret. By reframing calibration as a swap-regret minimization problem and instantiating a TreeSwap-based approach, the authors introduce TreeCal, a universal calibration algorithm that attains Cal_T^{||·||^2} ≤ ε T once T is large enough, with bounds scaling as (diam(P)/√ε)^{O(Rate(P, ||·||)/ε). The analysis shows that the optimal regularizer for online linear optimization governs the calibration error, enabling simultaneous guarantees for any norm and any convex P without relying on external OLO subroutines. For special cases, such as the simplex with l1 norm, the results recover and generalize existing bounds to yield ε-calibration in d^{O(1/ε^2)} rounds, while a lower bound demonstrates the necessity of exponential dependence on 1/ε in general. Overall, the paper bridges calibration and OLO theory, providing a unifying, norm-agnostic framework with strong high-dimensional guarantees and clear limitations.

Abstract

We study the online calibration of multi-dimensional forecasts over an arbitrary convex set relative to an arbitrary norm . We connect this with the problem of external regret minimization for online linear optimization, showing that if it is possible to guarantee worst-case regret after rounds when actions are drawn from and losses are drawn from the dual unit norm ball, then it is also possible to obtain -calibrated forecasts after rounds. When is the -dimensional simplex and is the -norm, the existence of -regret algorithms for learning with experts implies that it is possible to obtain -calibrated forecasts after rounds, recovering a recent result of Peng (2025). Interestingly, our algorithm obtains this guarantee without requiring access to any online linear optimization subroutine or knowledge of the optimal rate -- in fact, our algorithm is identical for every setting of and . Instead, we show that the optimal regularizer for the above OLO problem can be used to upper bound the above calibration error by a swap regret, which we then minimize by running the recent TreeSwap algorithm with Follow-The-Leader as a subroutine. Finally, we prove that any online calibration algorithm that guarantees -calibration error over the -dimensional simplex requires (assuming ). This strengthens the corresponding lower bound of Peng, and shows that an exponential dependence on is necessary.

Paper Structure

This paper contains 31 sections, 22 theorems, 52 equations, 3 figures, 3 algorithms.

Key Result

Theorem 1.1

Fix a convex set $\mathcal{P}$ and a norm $\| \cdot \|$. Assume there exists a function $R: \mathcal{P} \rightarrow \mathbb{R}$ that is $1$-strongly-convex with respect to $\|\cdot\|$ and has range ($\max_{x \in \mathcal{P}} R(x) - \min_{p \in \mathcal{P}} R(x)$) at most $\rho$. Then $\mathtt{TreeCa

Figures (3)

  • Figure 1: Geometric depiction of the Bregman divergence from $p$ to $y$.
  • Figure 2: [Proof of Lemma \ref{['lem:bias-variance-decomp']}] the average Bregman divergence (orange + purple) decomposes into the Jensen error (orange) and the Bregman divergence to the mean (purple). For example, when $R(p) = \left\|p\right\|_2^2$, $D_R(y|p) = \left\|y-p\right\|_2^2$ and we recover the bias-variance decomposition.
  • Figure 3: Visualization of the state of $\mathtt{TreeCal}$/$\mathtt{TreeSwap}$ at time step $t$ (about half-way through the algorithm). For $H=3$, we depict the intervals $\Gamma$ of the first three non-root levels of the tree $(l=1,2,3)$. Each rectangular node represents an interval, with sibling nodes separated by red lines. We represent the specific time step $t$ via the vertical dashed green line. The yellow intervals it intersects at each level correspond to the nodes on the root-to-leaf-$t$ path. Accordingly, $\mathbf{x}_t$ will be the uniform distribution over the labels $p$ of these yellow intervals. We see that the algorithm has committed to the labels of all intervals that started at or before time $t$, and has yet to label the future intervals.

Theorems & Definitions (40)

  • Theorem 1.1: Informal restatement of Corollary \ref{['cor:cauchy']}
  • Corollary 1.2: Informal restatement of Corollary \ref{['cor:olo-rate']}
  • Theorem 1.3: Informal restatement of Theorem \ref{['thm:treecal']}
  • Theorem 1.4: Informal restatement of Theorem \ref{['thm:cal-lb']}
  • Lemma 2.1
  • Lemma 2.2
  • Theorem 2.3
  • proof
  • Lemma 2.4
  • Theorem 3.1: Main theorem
  • ...and 30 more