Multivariate Latent Recalibration for Conditional Normalizing Flows
Victor Dheur, Souhaib Ben Taieb
TL;DR
This work tackles the challenge of calibrating multivariate conditional distributions learned by flexible generators. It introduces latent recalibration (LR), a post-hoc method that operates in the latent space of conditional normalizing flows to achieve multivariate latent calibration and produce an explicit recalibrated PDF. LR delivers finite-sample calibration guarantees and maintains computational efficiency, outperforming uncalibrated baselines in latent calibration and often achieving better negative log-likelihood on tabular and high-dimensional image data. By connecting LR with conformal prediction and HDR concepts, the approach offers practical uncertainty quantification with actionable density estimates for decision-making tasks across domains.
Abstract
Reliably characterizing the full conditional distribution of a multivariate response variable given a set of covariates is crucial for trustworthy decision-making. However, misspecified or miscalibrated multivariate models may yield a poor approximation of the joint distribution of the response variables, leading to unreliable predictions and suboptimal decisions. Furthermore, standard recalibration methods are primarily limited to univariate settings, while conformal prediction techniques, despite generating multivariate prediction regions with coverage guarantees, do not provide a full probability density function. We address this gap by first introducing a novel notion of latent calibration, which assesses probabilistic calibration in the latent space of a conditional normalizing flow. Second, we propose latent recalibration (LR), a novel post-hoc model recalibration method that learns a transformation of the latent space with finite-sample bounds on latent calibration. Unlike existing methods, LR produces a recalibrated distribution with an explicit multivariate density function while remaining computationally efficient. Extensive experiments on both tabular and image datasets show that LR consistently improves latent calibration error and the negative log-likelihood of the recalibrated models.
