Table of Contents
Fetching ...

A Digital Twin Framework for Metamorphic Testing of Autonomous Driving Systems Using Generative Model

Tony Zhang, Burak Kantarci, Umair Siddique

TL;DR

This work tackles the oracle problem in autonomous driving system testing by combining metamorphic testing (MT) with a digital twin, enabling ODD-aware, generative-scene variations. The authors formalize an ODD-aware MT framework that uses an ODD definition $ODD=(P,E,O,T,C)$ and an environment set $E$ to constrain transformations, and they implement an ODD-aware generator $G(x,\tau_{ODD})$ to produce $x'$ while preserving core semantics. They introduce uncertainty-aware metamorphic relations and temporal analysis within an integrated validation framework, validated in the Udacity simulator with the DAVE-2 architecture and Stable Diffusion-XL, achieving state-of-the-art metrics (e.g., $TPR=0.719$, $F1=0.689$, $Precision=0.662$ for MR2) and demonstrating improved test coverage over baselines like Self-Oracle and DeepRoad. The proposed approach offers a scalable, high-fidelity pathway for systematic ADS safety verification, with extensions to additional metamorphic relations and considerations for real-time deployment as generative-model efficiency improves.

Abstract

Ensuring the safety of self-driving cars remains a major challenge due to the complexity and unpredictability of real-world driving environments. Traditional testing methods face significant limitations, such as the oracle problem, which makes it difficult to determine whether a system's behavior is correct, and the inability to cover the full range of scenarios an autonomous vehicle may encounter. In this paper, we introduce a digital twin-driven metamorphic testing framework that addresses these challenges by creating a virtual replica of the self-driving system and its operating environment. By combining digital twin technology with AI-based image generative models such as Stable Diffusion, our approach enables the systematic generation of realistic and diverse driving scenes. This includes variations in weather, road topology, and environmental features, all while maintaining the core semantics of the original scenario. The digital twin provides a synchronized simulation environment where changes can be tested in a controlled and repeatable manner. Within this environment, we define three metamorphic relations inspired by real-world traffic rules and vehicle behavior. We validate our framework in the Udacity self-driving simulator and demonstrate that it significantly enhances test coverage and effectiveness. Our method achieves the highest true positive rate (0.719), F1 score (0.689), and precision (0.662) compared to baseline approaches. This paper highlights the value of integrating digital twins with AI-powered scenario generation to create a scalable, automated, and high-fidelity testing solution for autonomous vehicle safety.

A Digital Twin Framework for Metamorphic Testing of Autonomous Driving Systems Using Generative Model

TL;DR

This work tackles the oracle problem in autonomous driving system testing by combining metamorphic testing (MT) with a digital twin, enabling ODD-aware, generative-scene variations. The authors formalize an ODD-aware MT framework that uses an ODD definition and an environment set to constrain transformations, and they implement an ODD-aware generator to produce while preserving core semantics. They introduce uncertainty-aware metamorphic relations and temporal analysis within an integrated validation framework, validated in the Udacity simulator with the DAVE-2 architecture and Stable Diffusion-XL, achieving state-of-the-art metrics (e.g., , , for MR2) and demonstrating improved test coverage over baselines like Self-Oracle and DeepRoad. The proposed approach offers a scalable, high-fidelity pathway for systematic ADS safety verification, with extensions to additional metamorphic relations and considerations for real-time deployment as generative-model efficiency improves.

Abstract

Ensuring the safety of self-driving cars remains a major challenge due to the complexity and unpredictability of real-world driving environments. Traditional testing methods face significant limitations, such as the oracle problem, which makes it difficult to determine whether a system's behavior is correct, and the inability to cover the full range of scenarios an autonomous vehicle may encounter. In this paper, we introduce a digital twin-driven metamorphic testing framework that addresses these challenges by creating a virtual replica of the self-driving system and its operating environment. By combining digital twin technology with AI-based image generative models such as Stable Diffusion, our approach enables the systematic generation of realistic and diverse driving scenes. This includes variations in weather, road topology, and environmental features, all while maintaining the core semantics of the original scenario. The digital twin provides a synchronized simulation environment where changes can be tested in a controlled and repeatable manner. Within this environment, we define three metamorphic relations inspired by real-world traffic rules and vehicle behavior. We validate our framework in the Udacity self-driving simulator and demonstrate that it significantly enhances test coverage and effectiveness. Our method achieves the highest true positive rate (0.719), F1 score (0.689), and precision (0.662) compared to baseline approaches. This paper highlights the value of integrating digital twins with AI-powered scenario generation to create a scalable, automated, and high-fidelity testing solution for autonomous vehicle safety.

Paper Structure

This paper contains 22 sections, 13 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: The case of using a generative model to apply transformations to a real image.
  • Figure 2: Architecture of Digital Twin for metamorphic testing of ADS and its three key components: (1) Digital Twin Generation, (2) Integrated Validation, (3) Time Series Analysis.
  • Figure 3: Sample scenes from test dataset, which includes diverse driving scenarios across different times of day and weather conditions.
  • Figure 4: Distribution of successful crash predictions made by each MR variant across time leading up to the crash.