Table of Contents
Fetching ...

Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles

Qiujing Lu, Xuanhan Wang, Yiwei Jiang, Guangming Zhao, Mingyue Ma, Shuo Feng

TL;DR

This work tackles the challenge of efficiently generating diverse, realistic corner-case scenarios for autonomous vehicle testing. It introduces OmniTester, a multimodal LLM-driven pipeline that creates road networks and vehicle trajectories from text prompts, integrating SUMO for simulation, RAG for knowledge grounding, and self-improvement loops to reduce hallucinations. Key contributions include a two-stage generation framework (road network then vehicle routes), detailed prompt engineering with CoT strategies, and an evaluation showing high controllability and realism, plus crash-report–driven reconstruction capabilities. The approach holds practical significance for scalable AV safety testing by enabling targeted, diverse scenarios beyond pre-existing datasets.

Abstract

The generation of corner cases has become increasingly crucial for efficiently testing autonomous vehicles prior to road deployment. However, existing methods struggle to accommodate diverse testing requirements and often lack the ability to generalize to unseen situations, thereby reducing the convenience and usability of the generated scenarios. A method that facilitates easily controllable scenario generation for efficient autonomous vehicles (AV) testing with realistic and challenging situations is greatly needed. To address this, we proposed OmniTester: a multimodal Large Language Model (LLM) based framework that fully leverages the extensive world knowledge and reasoning capabilities of LLMs. OmniTester is designed to generate realistic and diverse scenarios within a simulation environment, offering a robust solution for testing and evaluating AVs. In addition to prompt engineering, we employ tools from Simulation of Urban Mobility to simplify the complexity of codes generated by LLMs. Furthermore, we incorporate Retrieval-Augmented Generation and a self-improvement mechanism to enhance the LLM's understanding of scenarios, thereby increasing its ability to produce more realistic scenes. In the experiments, we demonstrated the controllability and realism of our approaches in generating three types of challenging and complex scenarios. Additionally, we showcased its effectiveness in reconstructing new scenarios described in crash report, driven by the generalization capability of LLMs.

Multimodal Large Language Model Driven Scenario Testing for Autonomous Vehicles

TL;DR

This work tackles the challenge of efficiently generating diverse, realistic corner-case scenarios for autonomous vehicle testing. It introduces OmniTester, a multimodal LLM-driven pipeline that creates road networks and vehicle trajectories from text prompts, integrating SUMO for simulation, RAG for knowledge grounding, and self-improvement loops to reduce hallucinations. Key contributions include a two-stage generation framework (road network then vehicle routes), detailed prompt engineering with CoT strategies, and an evaluation showing high controllability and realism, plus crash-report–driven reconstruction capabilities. The approach holds practical significance for scalable AV safety testing by enabling targeted, diverse scenarios beyond pre-existing datasets.

Abstract

The generation of corner cases has become increasingly crucial for efficiently testing autonomous vehicles prior to road deployment. However, existing methods struggle to accommodate diverse testing requirements and often lack the ability to generalize to unseen situations, thereby reducing the convenience and usability of the generated scenarios. A method that facilitates easily controllable scenario generation for efficient autonomous vehicles (AV) testing with realistic and challenging situations is greatly needed. To address this, we proposed OmniTester: a multimodal Large Language Model (LLM) based framework that fully leverages the extensive world knowledge and reasoning capabilities of LLMs. OmniTester is designed to generate realistic and diverse scenarios within a simulation environment, offering a robust solution for testing and evaluating AVs. In addition to prompt engineering, we employ tools from Simulation of Urban Mobility to simplify the complexity of codes generated by LLMs. Furthermore, we incorporate Retrieval-Augmented Generation and a self-improvement mechanism to enhance the LLM's understanding of scenarios, thereby increasing its ability to produce more realistic scenes. In the experiments, we demonstrated the controllability and realism of our approaches in generating three types of challenging and complex scenarios. Additionally, we showcased its effectiveness in reconstructing new scenarios described in crash report, driven by the generalization capability of LLMs.
Paper Structure (17 sections, 11 figures, 4 tables)

This paper contains 17 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The LLM generation framework of OmniTester
  • Figure 2: Dataflow within OmniTester: The Interpreter, RAG module, Net Generator, Vehicle Generator and LLM Evaluator are activated upon the user's request.
  • Figure 3: Generate a road XML file through a two-step process: first, use a properly prompted LLM to directly generate the node and edge files, then use SUMO to convert these into the corresponding net file, all in XML format.
  • Figure 4: Self-improvement feedback loops for route generation.
  • Figure 5: Sampled realistic intersections generated by OmniTester are presented. Left: Description generated by Interpreter; Middle: SUMO visualized road structure from the net XML file; Right: Similar roads found in the real world.
  • ...and 6 more figures