Table of Contents
Fetching ...

Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model

Qiujing Lu, Meng Ma, Ximiao Dai, Xuanhan Wang, Shuo Feng

TL;DR

AutoScenario tackles the challenge of generating realistic, diverse, and controllable corner cases for autonomous vehicles by leveraging a multimodal LLM-powered pipeline that translates multimodal real-world data into textual scene descriptions and drives SUMO and CARLA to construct simulation-ready scenarios. The approach encodes inputs into a unified linguistic space and decodes them into network, agent, and object configurations guided by domain knowledge, enabling tailored testing scenarios from text, images, and videos, including crash reports. Experiments show high conformity, broad diversity, and the ability to produce novel corner cases, with ablations confirming the necessity of interpreters, chain-of-thought prompting, and prior knowledge. The work advances scenario-based AV testing by bridging narrative requirements and detailed simulation configurations, and suggests future work on photorealism and online learning to further adapt to corner-case discovery.

Abstract

To guarantee the safety and reliability of autonomous vehicle (AV) systems, corner cases play a crucial role in exploring the system's behavior under rare and challenging conditions within simulation environments. However, current approaches often fall short in meeting diverse testing needs and struggle to generalize to novel, high-risk scenarios that closely mirror real-world conditions. To tackle this challenge, we present AutoScenario, a multimodal Large Language Model (LLM)-based framework for realistic corner case generation. It converts safety-critical real-world data from multiple sources into textual representations, enabling the generalization of key risk factors while leveraging the extensive world knowledge and advanced reasoning capabilities of LLMs.Furthermore, it integrates tools from the Simulation of Urban Mobility (SUMO) and CARLA simulators to simplify and execute the code generated by LLMs. Our experiments demonstrate that AutoScenario can generate realistic and challenging test scenarios, precisely tailored to specific testing requirements or textual descriptions. Additionally, we validated its ability to produce diverse and novel scenarios derived from multimodal real-world data involving risky situations, harnessing the powerful generalization capabilities of LLMs to effectively simulate a wide range of corner cases.

Realistic Corner Case Generation for Autonomous Vehicles with Multimodal Large Language Model

TL;DR

AutoScenario tackles the challenge of generating realistic, diverse, and controllable corner cases for autonomous vehicles by leveraging a multimodal LLM-powered pipeline that translates multimodal real-world data into textual scene descriptions and drives SUMO and CARLA to construct simulation-ready scenarios. The approach encodes inputs into a unified linguistic space and decodes them into network, agent, and object configurations guided by domain knowledge, enabling tailored testing scenarios from text, images, and videos, including crash reports. Experiments show high conformity, broad diversity, and the ability to produce novel corner cases, with ablations confirming the necessity of interpreters, chain-of-thought prompting, and prior knowledge. The work advances scenario-based AV testing by bridging narrative requirements and detailed simulation configurations, and suggests future work on photorealism and online learning to further adapt to corner-case discovery.

Abstract

To guarantee the safety and reliability of autonomous vehicle (AV) systems, corner cases play a crucial role in exploring the system's behavior under rare and challenging conditions within simulation environments. However, current approaches often fall short in meeting diverse testing needs and struggle to generalize to novel, high-risk scenarios that closely mirror real-world conditions. To tackle this challenge, we present AutoScenario, a multimodal Large Language Model (LLM)-based framework for realistic corner case generation. It converts safety-critical real-world data from multiple sources into textual representations, enabling the generalization of key risk factors while leveraging the extensive world knowledge and advanced reasoning capabilities of LLMs.Furthermore, it integrates tools from the Simulation of Urban Mobility (SUMO) and CARLA simulators to simplify and execute the code generated by LLMs. Our experiments demonstrate that AutoScenario can generate realistic and challenging test scenarios, precisely tailored to specific testing requirements or textual descriptions. Additionally, we validated its ability to produce diverse and novel scenarios derived from multimodal real-world data involving risky situations, harnessing the powerful generalization capabilities of LLMs to effectively simulate a wide range of corner cases.

Paper Structure

This paper contains 21 sections, 1 equation, 18 figures, 7 tables.

Figures (18)

  • Figure 1: AutoScenario: an LLM based framework for automated generation of realistic corner cases.
  • Figure 2: AutoScenario system overview: it accepts multimodal inputs, which are processed by the Multimodal Interpreter. Based on the generalized scenario description, the Components Generator activates to build key components, after which the Scenario Generator is used for scenario testing..
  • Figure 3: Tools utilized in AutoScenario: SUMO, CARLA and data-driven models.
  • Figure 4: Left: AutoScenario generation using crash reports from NHTSA NHTSA2023 as input. Right: The scene generated at the moment before accidents.
  • Figure 5: Based on the same input from image interpreter, AutoScenario can generate diverse scenarios.
  • ...and 13 more figures