Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

Hanlin Tian; Kethan Reddy; Yuxiang Feng; Mohammed Quddus; Yiannis Demiris; Panagiotis Angeloudis

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, Panagiotis Angeloudis

TL;DR

CRITICAL presents a closed-loop framework that jointly generates safety-critical driving scenarios and trains AV agents, using highD-driven data, behavior clustering, surrogate risk metrics, and optional LLM-guided scenario refinement. The approach enhances sample efficiency and robustness by exposing PPO-based agents to diverse, risk-focused configurations in HighwayEnv, with TTC and a unified risk index guiding criticality assessment. Empirical results show faster learning, higher rewards, and fewer crashes, with further gains when incorporating LLM analysis via LangChain and Mistral-7B-Instruct. The work demonstrates the potential of integrating data-driven scenario generation with language-model reasoning to improve AV safety validation and accelerate RL-based development.

Abstract

This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

TL;DR

Abstract

Paper Structure (16 sections, 7 equations, 5 figures, 2 tables)

This paper contains 16 sections, 7 equations, 5 figures, 2 tables.

INTRODUCTION
RELATED WORK
Scenario Generation for Autonomous Vehicle
Large Language Models in Autonomous Vehicles
Critical Scenarios
METHODOLOGY
Reinforcement Learning for Autonomous Vehicle Environment
highD Dataset for Realistic Traffic Simulation
Risk Metrics to Measure Safety-Criticality
EXPERIMENTS AND RESULTS
Transforming highD Data for Scenario Generation
Prompting the LLM
Computational Setup
Evaluation and Results
Critical Scenario Generation: Affects on RL-Agent Performance & Criticality Measurements
...and 1 more sections

Figures (5)

Figure 1: A illustration of the general algorithmic flow of CRITICAL. An RL agent is exposed to a scenario (which is a function of the environment configuration), and after a designated number of episodes, we collate pertinent environment outputs (yellow) from every episode. This can then be used to directly generate a new scenario, or first be parsed into a prompt and fed into an LLM, to suggest an alternative environment configuration (blue) for instantiating scenarios.
Figure 2: Edge-case scenarios (yellow) are defined by their rarity in a given ODD. We depict edge-case scenarios to be below a certain threshold value of the number of occurrences N. Boundary scenarios (blue) have criticality measure $\Gamma$ beyond a threshold value. Critical scenarios (green) are defined as scenarios that have the union of these two circumstances.
Figure 3: A architecture diagram mapping out the various components of CRITICAL. The framework first sets up an environment configuration based on typical real-world traffic from the highD dataset highDdataset. These configurations are then leveraged to generate HighwayEnv highway-env scenarios. At the end of each episode, we collect data including failure reports, risk metrics, and rewards, repeating this process multiple times to gather a collection of configuration files with associated scenario risk assessments. To enhance RL training, we analyze a distribution of configurations based on risk metrics, identifying those conducive to critical scenarios. We then either directly use these configurations for new scenarios or prompt an LLM to generate critical scenarios.
Figure 4: Training loss comparison for Vanilla PPO (Blue), PPO with Critical Case Generation (Orange), and PPO with Critical Case Generation and Large Language Model Analysis (Green).
Figure 5: The distributions on the right (green) are the configurations quantified by the $TTC$near miss counts, while the distributions on the left (blue) are the configurations by unified risk index $r$ threshold counts. The distributions along the top row are the baseline. The middle row corresponds to Critical Case Generation without LLM, and the bottom row is LLM-Enhanced Critical Case Generations.

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

TL;DR

Abstract

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)