How Safe Will I Be Given What I Saw? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy
Zhenjiang Mao, Mrinall Eashaan Umasudhan, Ivan Ruchkin
TL;DR
The paper tackles calibrated safety prediction for image-controlled autonomous systems under partial observability and distribution shift by introducing a modular framework that combines world-model latent forecasting, monolithic and composite predictors, unsupervised domain adaptation (MEMO), and conformal calibration. It demonstrates that decomposing the problem into latent representation, trajectory forecasting, and safety evaluation provides strong long-horizon performance, particularly when equipped with attention-based forecasters and latent evaluators. A key contribution is the development of post-hoc conformal calibration and adaptive binning to produce reliable probability intervals for safety predictions, with theoretical guarantees. Experimentally, the approach is validated on racing-car, cart-pole, and Donkey Car benchmarks, showing improved F1, reduced false positives, robust performance under distribution shift, and reliable calibration bounds across horizons.
Abstract
Autonomous robots that rely on deep neural network controllers pose critical challenges for safety prediction, especially under partial observability and distribution shift. Traditional model-based verification techniques are limited in scalability and require access to low-dimensional state models, while model-free methods often lack reliability guarantees. This paper addresses these limitations by introducing a framework for calibrated safety prediction in end-to-end vision-controlled systems, where neither the state-transition model nor the observation model is accessible. Building on the foundation of world models, we leverage variational autoencoders and recurrent predictors to forecast future latent trajectories from raw image sequences and estimate the probability of satisfying safety properties. We distinguish between monolithic and composite prediction pipelines and introduce a calibration mechanism to quantify prediction confidence. In long-horizon predictions from high-dimensional observations, the forecasted inputs to the safety evaluator can deviate significantly from the training distribution due to compounding prediction errors and changing environmental conditions, leading to miscalibrated risk estimates. To address this, we incorporate unsupervised domain adaptation to ensure robustness of safety evaluation under distribution shift in predictions without requiring manual labels. Our formulation provides theoretical calibration guarantees and supports practical evaluation across long prediction horizons. Experimental results on three benchmarks show that our UDA-equipped evaluators maintain high accuracy and substantially lower false positive rates under distribution shift. Similarly, world model-based composite predictors outperform their monolithic counterparts on long-horizon tasks, and our conformal calibration provides reliable statistical bounds.
