Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)
Biprateep Dey, David Zhao, Brett H. Andrews, Jeffrey A. Newman, Rafael Izbicki, Ann B. Lee
TL;DR
The paper addresses the challenge of instance-wise calibration for conditional density estimates in complex scientific settings. It introduces LADaR and Cal-PIT, a regression-based PIT-CDF learning method that provides local calibration diagnostics across feature space and a probability-to-probability reshaping rule to morph initial CDEs into recalibrated ones. The approach comes with theoretical guarantees for asymptotic conditional validity of calibrated intervals and practical demonstrations on synthetic data, high-dimensional sequence forecasting, and a galaxy photometric redshift benchmark where Cal-PIT outperforms 11 baselines. By providing fully amortized, interpretable diagnostics and density shaping, Cal-PIT enables more reliable uncertainty quantification and calibration-aware scientific inference, with notable impact on next-generation cosmological surveys and probabilistic weather forecasting.
Abstract
Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variable $y$ given complex inputs $\mathbf{x}$. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for all $\mathbf{x}$, and when needed, to reshape the densities of $y$ toward "instance-wise" calibration. This paper introduces the LADaR (Local Amortized Diagnostics and Reshaping of Conditional Densities) framework and proposes a new computationally efficient algorithm ($\texttt{Cal-PIT}$) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs). $\texttt{Cal-PIT}$ learns a single interpretable local probability--probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, where $\texttt{Cal-PIT}$ achieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyses.
