ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
TL;DR
ChatScene introduces a retrieval-augmented, LLM-based agent that converts textual safety-critical driving scenario descriptions into executable Scenic code for CARLA simulations. It constructs a scalable retrieval database of Scenic snippets and uses embedding-based retrieval to assemble complete Scenic scripts from NL descriptions, enabling diverse, adversarial test scenarios. Empirical results show ChatScene generates more challenging scenarios (higher collision rates) and improves ego-vehicle robustness after adversarial finetuning, outperforming baselines on Safebench metrics and ADE diversity. The approach provides a practical, extensible framework for comprehensive AV safety evaluation and potential multimodal extensions to text, image, and video inputs.
Abstract
We present ChatScene, a Large Language Model (LLM)-based agent that leverages the capabilities of LLMs to generate safety-critical scenarios for autonomous vehicles. Given unstructured language instructions, the agent first generates textually described traffic scenarios using LLMs. These scenario descriptions are subsequently broken down into several sub-descriptions for specified details such as behaviors and locations of vehicles. The agent then distinctively transforms the textually described sub-scenarios into domain-specific languages, which then generate actual code for prediction and control in simulators, facilitating the creation of diverse and complex scenarios within the CARLA simulation environment. A key part of our agent is a comprehensive knowledge retrieval component, which efficiently translates specific textual descriptions into corresponding domain-specific code snippets by training a knowledge database containing the scenario description and code pairs. Extensive experimental results underscore the efficacy of ChatScene in improving the safety of autonomous vehicles. For instance, the scenarios generated by ChatScene show a 15% increase in collision rates compared to state-of-the-art baselines when tested against different reinforcement learning-based ego vehicles. Furthermore, we show that by using our generated safety-critical scenarios to fine-tune different RL-based autonomous driving models, they can achieve a 9% reduction in collision rates, surpassing current SOTA methods. ChatScene effectively bridges the gap between textual descriptions of traffic scenarios and practical CARLA simulations, providing a unified way to conveniently generate safety-critical scenarios for safety testing and improvement for AVs.
