Table of Contents
Fetching ...

Deterministic World Models for Verification of Closed-loop Vision-based Systems

Yuang Geng, Zhuoyang Zhou, Zhongzheng Zhang, Siyuan Pan, Hoang-Dung Tran, Ivan Ruchkin

TL;DR

This work tackles the challenge of verifying safety for vision-based closed-loop control by replacing stochastic camera surrogates with a deterministic decoder that maps physical states directly to images. A dual-loss trained Deterministic World Model (DWM) enables precise, state-driven image generation, which, when combined with Star-based reachability (StarV) and layer-wise propagation, yields tight, scalable verification for image-based controllers. The authors further bridge the surrogate to the real system using conformal prediction to bound trajectory deviations and inflate reachable sets for probabilistic guarantees, achieving rigorous containment of real trajectories with high confidence. Across CartPole, MountainCar, and Pendulum, the approach delivers superior precision, higher F1 scores, and reduced conservatism compared with latent-variable baselines, demonstrating practical impact for safety-critical vision-enabled systems.

Abstract

Verifying closed-loop vision-based control systems remains a fundamental challenge due to the high dimensionality of images and the difficulty of modeling visual environments. While generative models are increasingly used as camera surrogates in verification, their reliance on stochastic latent variables introduces unnecessary overapproximation error. To address this bottleneck, we propose a Deterministic World Model (DWM) that maps system states directly to generative images, effectively eliminating uninterpretable latent variables to ensure precise input bounds. The DWM is trained with a dual-objective loss function that combines pixel-level reconstruction accuracy with a control difference loss to maintain behavioral consistency with the real system. We integrate DWM into a verification pipeline utilizing Star-based reachability analysis (StarV) and employ conformal prediction to derive rigorous statistical bounds on the trajectory deviation between the world model and the actual vision-based system. Experiments on standard benchmarks show that our approach yields significantly tighter reachable sets and better verification performance than a latent-variable baseline.

Deterministic World Models for Verification of Closed-loop Vision-based Systems

TL;DR

This work tackles the challenge of verifying safety for vision-based closed-loop control by replacing stochastic camera surrogates with a deterministic decoder that maps physical states directly to images. A dual-loss trained Deterministic World Model (DWM) enables precise, state-driven image generation, which, when combined with Star-based reachability (StarV) and layer-wise propagation, yields tight, scalable verification for image-based controllers. The authors further bridge the surrogate to the real system using conformal prediction to bound trajectory deviations and inflate reachable sets for probabilistic guarantees, achieving rigorous containment of real trajectories with high confidence. Across CartPole, MountainCar, and Pendulum, the approach delivers superior precision, higher F1 scores, and reduced conservatism compared with latent-variable baselines, demonstrating practical impact for safety-critical vision-enabled systems.

Abstract

Verifying closed-loop vision-based control systems remains a fundamental challenge due to the high dimensionality of images and the difficulty of modeling visual environments. While generative models are increasingly used as camera surrogates in verification, their reliance on stochastic latent variables introduces unnecessary overapproximation error. To address this bottleneck, we propose a Deterministic World Model (DWM) that maps system states directly to generative images, effectively eliminating uninterpretable latent variables to ensure precise input bounds. The DWM is trained with a dual-objective loss function that combines pixel-level reconstruction accuracy with a control difference loss to maintain behavioral consistency with the real system. We integrate DWM into a verification pipeline utilizing Star-based reachability analysis (StarV) and employ conformal prediction to derive rigorous statistical bounds on the trajectory deviation between the world model and the actual vision-based system. Experiments on standard benchmarks show that our approach yields significantly tighter reachable sets and better verification performance than a latent-variable baseline.

Paper Structure

This paper contains 46 sections, 2 theorems, 41 equations, 3 figures, 5 tables.

Key Result

theorem thmcountertheorem

Let $s_0, \cdots, s_T$ be a random trajectory of the real vision-based system over the time horizon $T$ starting from initial state $s_0 \sim D_0$. This trajectory will be contained in the reachable tube $\hat{\mathcal{R}}_0(S_0),\cdots,\hat{\mathcal{R}}_T(S_0)$ constructed by world-model reachabili

Figures (3)

  • Figure 1: Overview of our closed-loop verification framework. Our world model is first trained to generate images from states and evaluated with conformal prediction to bound trajectory mismatch. In closed-loop safety verification, we iterate over each initial set to calculate reachable sets with Star-based approximation.
  • Figure 2: Pixel-level interval comparison under identical physical state for the pendulum benchmark. Both models use the same angle and angular velocity interval $s = [0.81,\,0.82] \times [0.040,\,0.041]$. The cGAN additionally introduces latent variables $z \in [-0.1,0.1]^2$, which injects extra perturbations into the generated images.
  • Figure 3: DWM Closed-loop verification with camera roll-out ground-truth across three benchmarks. Dark Green and Red indicate correct classification, while Light Green represents conservatism (False Positives: safe cell incorrectly labeled as unsafe).

Theorems & Definitions (6)

  • definition thmcounterdefinition: Closed-loop Vision-based System
  • definition thmcounterdefinition: System Trajectory
  • definition thmcounterdefinition: Reachable Set and Tube
  • theorem thmcountertheorem: Confident Reachable Tube Containment
  • corollary thmcountercorollary: Confident Real-System Reachability
  • proof