Table of Contents
Fetching ...

Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner

Aizierjiang Aiersilan

TL;DR

This work tackles the scarcity and cost of collecting diverse safety-critical traffic scenarios for autonomous-vehicle motion planning. It introduces AutoSceneGen, a universal, cost-efficient framework that uses in-context learning with large language models to convert user-described scenarios into executable simulator configurations (e.g., CARLA) without training new models. The approach includes a robust pipeline with input filtering, exemplar-driven ICL, and a validator to ensure simulator compatibility, enabling automated generation of rich, rare, and open-world scenarios. Empirical results show that motion planners trained on AutoSceneGen data—alone or in combination with real datasets—achieve lower displacement errors (ADE/FDE) in trajectory prediction, demonstrating the practical value of synthetic, diverse training data for safety-critical evaluation. Overall, AutoSceneGen offers scalable, end-to-end capabilities for rapid scenario generation and safety testing of AVs in open-world environments, with broad implications for reliability and accident reconstruction.

Abstract

Motion planning is a crucial component in autonomous driving. State-of-the-art motion planners are trained on meticulously curated datasets, which are not only expensive to annotate but also insufficient in capturing rarely seen critical scenarios. Failing to account for such scenarios poses a significant risk to motion planners and may lead to incidents during testing. An intuitive solution is to manually compose such scenarios by programming and executing a simulator (e.g., CARLA). However, this approach incurs substantial human costs. Motivated by this, we propose an inexpensive method for generating diverse critical traffic scenarios to train more robust motion planners. First, we represent traffic scenarios as scripts, which are then used by the simulator to generate traffic scenarios. Next, we develop a method that accepts user-specified text descriptions, which a Large Language Model translates into scripts using in-context learning. The output scripts are sent to the simulator that produces the corresponding traffic scenarios. As our method can generate abundant safety-critical traffic scenarios, we use them as synthetic training data for motion planners. To demonstrate the value of generated scenarios, we train existing motion planners on our synthetic data, real-world datasets, and a combination of both. Our experiments show that motion planners trained with our data significantly outperform those trained solely on real-world data, showing the usefulness of our synthetic data and the effectiveness of our data generation method. Our source code is available at https://ezharjan.github.io/AutoSceneGen.

Generating Traffic Scenarios via In-Context Learning to Learn Better Motion Planner

TL;DR

This work tackles the scarcity and cost of collecting diverse safety-critical traffic scenarios for autonomous-vehicle motion planning. It introduces AutoSceneGen, a universal, cost-efficient framework that uses in-context learning with large language models to convert user-described scenarios into executable simulator configurations (e.g., CARLA) without training new models. The approach includes a robust pipeline with input filtering, exemplar-driven ICL, and a validator to ensure simulator compatibility, enabling automated generation of rich, rare, and open-world scenarios. Empirical results show that motion planners trained on AutoSceneGen data—alone or in combination with real datasets—achieve lower displacement errors (ADE/FDE) in trajectory prediction, demonstrating the practical value of synthetic, diverse training data for safety-critical evaluation. Overall, AutoSceneGen offers scalable, end-to-end capabilities for rapid scenario generation and safety testing of AVs in open-world environments, with broad implications for reliability and accident reconstruction.

Abstract

Motion planning is a crucial component in autonomous driving. State-of-the-art motion planners are trained on meticulously curated datasets, which are not only expensive to annotate but also insufficient in capturing rarely seen critical scenarios. Failing to account for such scenarios poses a significant risk to motion planners and may lead to incidents during testing. An intuitive solution is to manually compose such scenarios by programming and executing a simulator (e.g., CARLA). However, this approach incurs substantial human costs. Motivated by this, we propose an inexpensive method for generating diverse critical traffic scenarios to train more robust motion planners. First, we represent traffic scenarios as scripts, which are then used by the simulator to generate traffic scenarios. Next, we develop a method that accepts user-specified text descriptions, which a Large Language Model translates into scripts using in-context learning. The output scripts are sent to the simulator that produces the corresponding traffic scenarios. As our method can generate abundant safety-critical traffic scenarios, we use them as synthetic training data for motion planners. To demonstrate the value of generated scenarios, we train existing motion planners on our synthetic data, real-world datasets, and a combination of both. Our experiments show that motion planners trained with our data significantly outperform those trained solely on real-world data, showing the usefulness of our synthetic data and the effectiveness of our data generation method. Our source code is available at https://ezharjan.github.io/AutoSceneGen.

Paper Structure

This paper contains 12 sections, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: Images captured at four distinct timestamps and locations, corresponding to the AutoSceneGen input scenario description: "In downtown area, during a drizzly noon, there are vehicles malfunctioning windshield wipers and some of the vehicles' doors are open. Some vehicles exhibit negligent driving behavior, compromising visibility in wet conditions. There are 10 pedestrians on the road, with 50% of the pedestrian running. No one was hurt and no accident happened since all the vehicles except the malfunctioning one obeyed the traffic rules."
  • Figure 2: Architecture Overview. It begins with the user inputting a scenario description, which is managed by the Exception Handler to block adversarial or irrelevant inputs, ensuring the framework operates within scope and prevents downstream issues. The Filter processes the description, replacing simulator-incompatible terms with those aligned to the simulator's documented APIs. The filtered description (Desc.') is combined with pre-constructed ICL exemplars, which can be zero-shot, one-shot, or few-shot in category, depending on the LLM's familiarity with the simulator's APIs and the complexity of the scenario. The LLM generates a response containing scenario configurations, often accompanied by explanations and comments. The Validator verifies each API call for compatibility, replacing unsupported terms with suitable alternatives (e.g., replacing "storm," unsupported in CARLA, with "rain") or ignoring them to prevent errors. This ensures all calls align with the simulator's capabilities, enabling execution of the final configuration file. The simulator runs the scenario, with the final step depicting the interaction between the real world and the virtual environment, while data collection can take place either inside the simulator (as is the case in this study) or externally.
  • Figure 3: The comparison of all metrics between the datasets collected via AutoSceneGen (Blue), ApolloScapes (Orange), and the combination of the two datasets (Green) across different epochs is shown. While the dataset collected purely from AutoSceneGen outperforms ApolloScapes in some epochs, such as epoch 127, the combination of AutoSceneGen and ApolloScapes demonstrates better overall results. Due to the distinct distribution of traffic participants in the two datasets, Figures (b), (c), and (d) show sharper peaks for FDE-vehicle and ADE-vehicle. However, the combination of the two datasets achieves reasonable values overall. In this experiment, ApolloScapes has a total of 3,917 frames, AutoSceneGen has 17,919 frames, and the combined AutoSceneGen + ApolloScapes has 27,605 frames.