Risk Scenario Generation for Autonomous Driving Systems based on Causal Bayesian Networks
Jiangnan Zhao, Dehui Du, Xing Yu, Hang Li
TL;DR
This work introduces a Causal Bayesian Network–based framework to generate risk scenarios for autonomous driving systems using the Maryland accident dataset and validates end-to-end testing in the CARLA simulator. The approach structures CBN construction into data collection, prior-knowledge integration, causal discovery and learning, followed by validation with refutation tests, enabling robust causal modeling of driving risks. Scenario generation leverages the CBN, focusing on key static factors and using clustering to capture representative risk patterns, and applies contract-based preconditions to reduce iterations. Empirical results show 89 high-risk scenarios from 5 seeds in 22 iterations and around 30 seconds per scenario, outperforming baseline methods in efficiency, with an upfront ~5-hour CBN construction that is reusable for multiple scenarios, demonstrating practical impact for rapid, end-to-end ADS testing in simulation.
Abstract
Advancements in Autonomous Driving Systems (ADS) have brought significant benefits, but also raised concerns regarding their safety. Virtual tests are common practices to ensure the safety of ADS because they are more efficient and safer compared to field operational tests. However, capturing the complex dynamics of real-world driving environments and effectively generating risk scenarios for testing is challenging. In this paper, we propose a novel paradigm shift towards utilizing Causal Bayesian Networks (CBN) for scenario generation in ADS. The CBN is built and validated using Maryland accident data, providing a deeper insight into the myriad factors influencing autonomous driving behaviors. Based on the constructed CBN, we propose an algorithm that significantly enhances the process of risk scenario generation, leading to more effective and safer ADS. An end-to-end testing framework for ADS is established utilizing the CARLA simulator. Through experiments, we successfully generated 89 high-risk scenarios from 5 seed scenarios, outperforming baseline methods in terms of time and iterations required.
