Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence
Spencer Young, Riley Sinema, Cole Edgren, Andrew Hall, Nathan Dong, Porter Jenkins
TL;DR
The paper addresses the inadequacy of calibration-based metrics for judging probabilistic predictive fit in neural regressors. It introduces conditional congruence and the Conditional Congruence Error (CCE), built on maximum conditional mean discrepancy and conditional kernel mean embeddings, to provide a point-wise, input-specific measure of how closely a model's predictive distribution matches the true conditional distribution. Through theoretical guarantees and extensive experiments on image regression datasets, CCE is shown to be correct, monotonic, reliable, and robust, with the added benefit of diagnosing point-wise failures without labels. The work demonstrates that CCE outperforms traditional metrics like ECE and NLL in characterizing probabilistic alignment, supporting more reliable deployment and enabling applications such as selective rejection based on congruence. It also offers practical guidance on hyperparameters and computation, paving the way for broader adoption in uncertainty quantification for regression tasks.
Abstract
While significant progress has been made in specifying neural networks capable of representing uncertainty, deep networks still often suffer from overconfidence and misaligned predictive distributions. Existing approaches for measuring this misalignment are primarily developed under the framework of calibration, with common metrics such as Expected Calibration Error (ECE). However, calibration can only provide a strictly marginal assessment of probabilistic alignment. Consequently, calibration metrics such as ECE are $\textit{distribution-wise}$ measures and cannot diagnose the $\textit{point-wise}$ reliability of individual inputs, which is important for real-world decision-making. We propose a stronger condition, which we term $\textit{conditional congruence}$, for assessing probabilistic fit. We also introduce a metric, Conditional Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance, at any point, between the learned predictive distribution and the empirical, conditional distribution in a dataset. We perform several high dimensional regression tasks and show that CCE exhibits four critical properties: $\textit{correctness}$, $\textit{monotonicity}$, $\textit{reliability}$, and $\textit{robustness}$.
