Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling
Yufan Li, Pragya Sur
TL;DR
This work develops a provably calibrated calibration framework for high-dimensional binary classification with Gaussian features. It introduces angular calibration, which interpolates between informative logits and Gaussian noise based on the angle between the estimated and true weight vectors, and proves both calibration and Bregman-optimality in the proportional regime where $n/d\to c$. It further shows that Platt scaling converges to the angular predictor under suitable conditions, providing a principled high-dimensional guarantee for a widely used method. Consistent estimation of the alignment angle via observable estimation cement the practical viability of the approach. Numerical experiments reinforce the theory, demonstrating calibration improvements and robustness across simulations and semi-real tasks, with extensions to non-Gaussian designs discussed for future work.
Abstract
We study the fundamental problem of calibrating a linear binary classifier of the form $σ(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $σ$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.
