How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

Zhenjiang Mao; Carson Sobolewski; Ivan Ruchkin

How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

Zhenjiang Mao, Carson Sobolewski, Ivan Ruchkin

TL;DR

This work tackles online safety prediction for image-controlled autonomous systems in the absence of a low-dimensional dynamical state. It introduces a configurable family of world-model–based predictors that operate on high-dimensional observations, with two predictor types (safety labels and safety chances) trained offline on controller-specific or controller-independent data. A post-hoc conformal calibration framework, combined with adaptive binning, provides statistical guarantees that the predicted safety probabilities bound the true safety rates with coverage $1-\alpha$. Empirical results on racing car and cart pole benchmarks show latent predictors and calibrated, conformally bounded safety chances yield more reliable decisions than uncalibrated or purely image-based approaches, supporting practical online safety interventions. The proposed calibration-friendly, state-agnostic approach offers scalable, interpretable reliability for high-dimensional perception in autonomous systems, with potential for broader safety assurances in vision-based control.

Abstract

End-to-end learning has emerged as a major paradigm for developing autonomous systems. Unfortunately, with its performance and convenience comes an even greater challenge of safety assurance. A key factor of this challenge is the absence of the notion of a low-dimensional and interpretable dynamical state, around which traditional assurance methods revolve. Focusing on the online safety prediction problem, this paper proposes a configurable family of learning pipelines based on generative world models, which do not require low-dimensional states. To implement these pipelines, we overcome the challenges of learning safety-informed latent representations and missing safety labels under prediction-induced distribution shift. These pipelines come with statistical calibration guarantees on their safety chance predictions based on conformal prediction. We perform an extensive evaluation of the proposed learning pipelines on two case studies of image-controlled systems: a racing car and a cartpole.

How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

TL;DR

. Empirical results on racing car and cart pole benchmarks show latent predictors and calibrated, conformally bounded safety chances yield more reliable decisions than uncalibrated or purely image-based approaches, supporting practical online safety interventions. The proposed calibration-friendly, state-agnostic approach offers scalable, interpretable reliability for high-dimensional perception in autonomous systems, with potential for broader safety assurances in vision-based control.

Abstract

Paper Structure (31 sections, 1 theorem, 7 equations, 6 figures, 17 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 7 equations, 6 figures, 17 tables, 1 algorithm.

Introduction
Preliminaries
Problem Setting
Predictors and Datasets
Safety Predictor Family
Monolithic and Composite Label Predictors
Training Process
Conformal Calibration for Chance Predictors
Experimental Results
Systems
Performance Metrics
Experimental Setup
Hardware
Dataset
Training details
...and 16 more sections

Key Result

Theorem 1

Theorem 2.1 in (Lei et al. 2018). Given a dataset bin $B= \{b_{k}\}_{k=1}^K$ of i.i.d. observation-state pairs $b_{k}=(y_k,x_k)$, we obtain a collection of datasets $\{B_{j}\}^{M}_{j=1}$ by drawing $M$ datasets of $N$ i.i.d. samples from $B$, leading to datasets $B_{j}$ to be drawn i.i.d. from a dat where $\overline{q}(B_{M+1})$ is the mean safety chance prediction in $B_{M+1}$: and $\overline{p}

Figures (6)

Figure 1: Dynamical system with predictors. Arrows show data flow, and dashes show optional controller dependence.
Figure 2: Our predictors (from top to bottom): monolithic, monolithic latent, composite image-based, and composite latent-based.
Figure 3: Upper: Car observations, L to R: safe car from $\mathbf Z$, unsafe car from $\mathbf Z$, safe car from $d$($f_{l}$($e$($\mathbf Z$))), safe car from $f_{g}$($\mathbf Z$). Lower: Cart pole observations. In both rows, the two rightmost images are distribution-shifted.
Figure 4: Performance of safety label predictors over varied horizons. L to R: (1) controller-specific ('c-sp') monolithic ('mon') vs. latent composite ('comp') for the racing car; (2) controller-independent ('ind') monolithic vs. latent composite for the racing car; (3) controller-specific monolithic vs. latent composite for the cart pole; (4) controller-independent monolithic vs. latent composite for the cart pole. Shaded uncertainty shows standard deviation due to different controllers and resampling.
Figure 5: Calibration of a monolithic predictor (CNN, csp.) for racing car with horizon $k=100$. Left: uncalibrated, right: calibrated w/ isotonic regression and conformal bounds for $\alpha=0.05$.
...and 1 more figures

Theorems & Definitions (10)

Definition 1: Dynamical system
Definition 2: Safety label predictor
Definition 3: Safety chance predictor
Definition 4: Observation-controller dataset
Definition 5: Observation-action dataset
Definition 6: Monolithic latent predictor
Definition 7: Composite image predictor
Definition 8: Composite latent predictor
Definition 9: Adaptive binning
Theorem 1

How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

TL;DR

Abstract

How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (10)