High dimensional online calibration in polynomial time
Binghui Peng
TL;DR
This work delivers the first polynomial-time online forecasting strategy that achieves non-trivial calibration in high dimensions, proving that $\varepsilon$-calibration can be attained after $T = d^{\tilde{O}(1/\varepsilon^2)}$ rounds against adaptive adversaries, and establishing a matching lower bound of $T \ge d^{\tilde{Ω}(\log(1/\varepsilon))}$. The method leverages a hierarchical ensemble of sub-forecasters, cross-entropy surrogate losses, and modern swap/no-swap regret techniques to realize distributional calibration with per-day cost $O(d\log(1/\varepsilon))$. A key insight is the entropy-dynamics across scales, which motivates refining predictions from coarse to fine granularity, while a recursive hard-sequence construction proves near-optimality of the polynomial-round guarantee. Together, these results resolve longstanding COLT questions about the computational efficiency of high-dimensional online calibration and illuminate the trade-offs between dimensionality and calibration accuracy. The findings have implications for downstream tasks such as swap-regret minimization, equilibrium computation, and fairness in predictive systems, where reliable probability calibration is crucial.
Abstract
In online (sequential) calibration, a forecaster predicts probability distributions over a finite outcome space $[d]$ over a sequence of $T$ days, with the goal of being calibrated. While asymptotically calibrated strategies are known to exist, they suffer from the curse of dimensionality: the best known algorithms require $\exp(d)$ days to achieve non-trivial calibration. In this work, we present the first asymptotically calibrated strategy that guarantees non-trivial calibration after a polynomial number of rounds. Specifically, for any desired accuracy $ε> 0$, our forecaster becomes $ε$-calibrated after $T = d^{O(1/ε^2)}$ days. We complement this result with a lower bound, proving that at least $T = d^{Ω(\log(1/ε))}$ rounds are necessary to achieve $ε$-calibration. Our results resolve the open questions posed by [Abernethy-Mannor'11, Hazan-Kakade'12]. Our algorithm is inspired by recent breakthroughs in swap regret minimization [Peng-Rubinstein'24, Dagan et al.'24]. Despite its strong theoretical guarantees, the approach is remarkably simple and intuitive: it randomly selects among a set of sub-forecasters, each of which predicts the empirical outcome frequency over recent time windows.
