Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)

Biprateep Dey; David Zhao; Brett H. Andrews; Jeffrey A. Newman; Rafael Izbicki; Ann B. Lee

Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)

Biprateep Dey, David Zhao, Brett H. Andrews, Jeffrey A. Newman, Rafael Izbicki, Ann B. Lee

TL;DR

The paper addresses the challenge of instance-wise calibration for conditional density estimates in complex scientific settings. It introduces LADaR and Cal-PIT, a regression-based PIT-CDF learning method that provides local calibration diagnostics across feature space and a probability-to-probability reshaping rule to morph initial CDEs into recalibrated ones. The approach comes with theoretical guarantees for asymptotic conditional validity of calibrated intervals and practical demonstrations on synthetic data, high-dimensional sequence forecasting, and a galaxy photometric redshift benchmark where Cal-PIT outperforms 11 baselines. By providing fully amortized, interpretable diagnostics and density shaping, Cal-PIT enables more reliable uncertainty quantification and calibration-aware scientific inference, with notable impact on next-generation cosmological surveys and probabilistic weather forecasting.

Abstract

Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variable $y$ given complex inputs $\mathbf{x}$. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for all $\mathbf{x}$, and when needed, to reshape the densities of $y$ toward "instance-wise" calibration. This paper introduces the LADaR (Local Amortized Diagnostics and Reshaping of Conditional Densities) framework and proposes a new computationally efficient algorithm ($\texttt{Cal-PIT}$) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs). $\texttt{Cal-PIT}$ learns a single interpretable local probability--probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, where $\texttt{Cal-PIT}$ achieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyses.

Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)

TL;DR

Abstract

Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variable

given complex inputs

. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for all

, and when needed, to reshape the densities of

toward "instance-wise" calibration. This paper introduces the LADaR (Local Amortized Diagnostics and Reshaping of Conditional Densities) framework and proposes a new computationally efficient algorithm (

) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs).

learns a single interpretable local probability--probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, where

achieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyses.

Paper Structure (25 sections, 6 theorems, 39 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 6 theorems, 39 equations, 11 figures, 1 table, 1 algorithm.

Introduction
Trustworthy Uncertainty Quantification
Well-Calibrated CDEs are Essential for the Physical Sciences
Our Contribution
Related Work
Methodology
Overview of the Cal-PIT Algorithm
Estimating the PIT-CDF
Reshaping Conditional Densities by Mapping Probabilities to Probabilities
Synthetic Examples
Example 1: Diagnostics and Reshaping of CDEs via P-P maps
Example 2: Probabilistic Nowcasting with High-Dimensional Sequence Data as Inputs
Example 3: Prediction Sets
Main Application: Reshaping CDEs of Galaxy Photometric Redshifts
Discussion
...and 10 more sections

Key Result

Theorem 1

Under Assumptions assump:continuity, assump:dominates and assump:bounded (sec:theory),

Figures (11)

Figure 1: Schematic representation of the LADaR approach. Our approach starts with an initial (e.g., physics-based or large pre-trained) model of the predictive distribution of a target quantity. We then assess the quality of the initial conditional density estimates (CDEs) on an individual basis across the feature space using calibration data, and reshape the densities if deemed necessary. The goal is not to replace the initial model with a different end-to-end density estimator, but rather to adjust it, ensuring both calibration and insight into its potential failure modes (see Figure \ref{['fig:PPplot_interpret']} for how to interpret P-P plots). The LADaR approach is particularly relevant when there are insufficient observational data to independently fit a purely machine-learning-based CDE, or when it is important to tie predictions to the underlying physical processes (encoded by the chosen feature space) to establish trust in machine-learning methods. Our framework is fully "amortized" over both features ${\mathbf{x}}$ and response variable $y$, which means that once we have trained LADaR to learn the map between the initial CDE model and the CDE of the calibration data, no additional training is required for new data.
Figure 2: Interpretable diagnostics. P-P plots are commonly used to assess how well a probability density model fits actual data. Such plots display, in a clear and interpretable way, effects like bias ( left panel) and dispersion ( right panel) in an estimated distribution $\widehat{f}$ vis-a-vis the true data-generating distribution $f$. Our framework yields an amortized approach to constructing local P-P plots for comparing Bayesian posteriors $\widehat{f}(\theta|{\mathbf{x}})$ or predictive densities $\widehat{f}(y|{\mathbf{x}})$ at any location ${\mathbf{x}}$ of the feature space $\mathcal{X}$. Figure adapted from zhao2021diagnostics. An interactive version of this figure can be found at: https://lee-group-cmu.github.io/cal-pit-paper/fig_1_interactive/.
Figure 3: Illustration of LADaR framework: Example 1 skewed data. Initial CDE is Gaussian, but the true distribution is skewed. Top panel (I): Local discrepancy score across the input space (first row) and examples of diagnostic P-P plots (second row). Cal-PIT identifies that the model is positively/negative biased relative to calibration data at $X =-1$ / $X =1$ but well-estimated at $X =0$. The diagnostics define a family of P-P maps for reshaping the initial densities to fit the calibration data across the feature space. Top panel (II): Continuous morphing of densities via Cal-PIT, illustrated at the three evaluation points, from the initial Gaussian distributions ( red; $s=0$) to the final distributions ( blue; $s=1$). For illustrative purposes, we have included intermediate values of $s$ to show the morphing of distributions. Bottom panel: Independent assessment of final results by computing a local Monte Carlo version of the continuous ranked probability score (MC-CRPS) at fixed $x$ before and after Cal-PIT.
Figure 4: TC satellite images.Left: A sequence of TC-centered cloud-top temperature images from GOES. Center: We represent each GOES image with a radial profile of azimuthally-averaged cloud-top temperatures. Right: The 24-hour sequence of consecutive radial profiles, sampled every 30 minutes, defines a structural trajectory $\mathbf{S}_{<t}$ or Hovmöller diagram. Figure adapted from mcneely2022TCs.
Figure 5: Synthetic data in Example 2. Simulated radial profiles $\{{\mathbf{X}}_t\}_{t \geq 0}$ and intensities $\{Y_t\}_{t \geq 0}$ for an example TC. Left: Each row represents the radial profile ${\mathbf{X}}_t$ of temperature as a function of radial distance from the storm center at time $t$. Our predictors are 48-hour overlapping sequences $\{\mathbf{S}_t\}_{t \geq 0}$ with data from the same "storm" being highly dependent. Right: The target response, here shown as a time series $\{Y_t\}_{t \geq 0}$ of simulated TC intensities.
...and 6 more figures

Theorems & Definitions (15)

Definition 1: Recalibrated CDE
Theorem 1: Performance of the recalibrated CDE
Remark 1
Remark 2: CDEs and Prediction Sets
Corollary 1: Convergence rate of recalibrated CDE
Theorem 2: Consistency and conditional coverage of Cal-PIT intervals
Lemma 1
proof
Lemma 2
proof
...and 5 more

Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)

TL;DR

Abstract

Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (15)