A Digital Twin Framework for Metamorphic Testing of Autonomous Driving Systems Using Generative Model
Tony Zhang, Burak Kantarci, Umair Siddique
TL;DR
This work tackles the oracle problem in autonomous driving system testing by combining metamorphic testing (MT) with a digital twin, enabling ODD-aware, generative-scene variations. The authors formalize an ODD-aware MT framework that uses an ODD definition $ODD=(P,E,O,T,C)$ and an environment set $E$ to constrain transformations, and they implement an ODD-aware generator $G(x,\tau_{ODD})$ to produce $x'$ while preserving core semantics. They introduce uncertainty-aware metamorphic relations and temporal analysis within an integrated validation framework, validated in the Udacity simulator with the DAVE-2 architecture and Stable Diffusion-XL, achieving state-of-the-art metrics (e.g., $TPR=0.719$, $F1=0.689$, $Precision=0.662$ for MR2) and demonstrating improved test coverage over baselines like Self-Oracle and DeepRoad. The proposed approach offers a scalable, high-fidelity pathway for systematic ADS safety verification, with extensions to additional metamorphic relations and considerations for real-time deployment as generative-model efficiency improves.
Abstract
Ensuring the safety of self-driving cars remains a major challenge due to the complexity and unpredictability of real-world driving environments. Traditional testing methods face significant limitations, such as the oracle problem, which makes it difficult to determine whether a system's behavior is correct, and the inability to cover the full range of scenarios an autonomous vehicle may encounter. In this paper, we introduce a digital twin-driven metamorphic testing framework that addresses these challenges by creating a virtual replica of the self-driving system and its operating environment. By combining digital twin technology with AI-based image generative models such as Stable Diffusion, our approach enables the systematic generation of realistic and diverse driving scenes. This includes variations in weather, road topology, and environmental features, all while maintaining the core semantics of the original scenario. The digital twin provides a synchronized simulation environment where changes can be tested in a controlled and repeatable manner. Within this environment, we define three metamorphic relations inspired by real-world traffic rules and vehicle behavior. We validate our framework in the Udacity self-driving simulator and demonstrate that it significantly enhances test coverage and effectiveness. Our method achieves the highest true positive rate (0.719), F1 score (0.689), and precision (0.662) compared to baseline approaches. This paper highlights the value of integrating digital twins with AI-powered scenario generation to create a scalable, automated, and high-fidelity testing solution for autonomous vehicle safety.
