Table of Contents
Fetching ...

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, Panagiotis Angeloudis

TL;DR

CRITICAL presents a closed-loop framework that jointly generates safety-critical driving scenarios and trains AV agents, using highD-driven data, behavior clustering, surrogate risk metrics, and optional LLM-guided scenario refinement. The approach enhances sample efficiency and robustness by exposing PPO-based agents to diverse, risk-focused configurations in HighwayEnv, with TTC and a unified risk index guiding criticality assessment. Empirical results show faster learning, higher rewards, and fewer crashes, with further gains when incorporating LLM analysis via LangChain and Mistral-7B-Instruct. The work demonstrates the potential of integrating data-driven scenario generation with language-model reasoning to improve AV safety validation and accelerate RL-based development.

Abstract

This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

TL;DR

CRITICAL presents a closed-loop framework that jointly generates safety-critical driving scenarios and trains AV agents, using highD-driven data, behavior clustering, surrogate risk metrics, and optional LLM-guided scenario refinement. The approach enhances sample efficiency and robustness by exposing PPO-based agents to diverse, risk-focused configurations in HighwayEnv, with TTC and a unified risk index guiding criticality assessment. Empirical results show faster learning, higher rewards, and fewer crashes, with further gains when incorporating LLM analysis via LangChain and Mistral-7B-Instruct. The work demonstrates the potential of integrating data-driven scenario generation with language-model reasoning to improve AV safety validation and accelerate RL-based development.

Abstract

This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.
Paper Structure (16 sections, 7 equations, 5 figures, 2 tables)

This paper contains 16 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: A illustration of the general algorithmic flow of CRITICAL. An RL agent is exposed to a scenario (which is a function of the environment configuration), and after a designated number of episodes, we collate pertinent environment outputs (yellow) from every episode. This can then be used to directly generate a new scenario, or first be parsed into a prompt and fed into an LLM, to suggest an alternative environment configuration (blue) for instantiating scenarios.
  • Figure 2: Edge-case scenarios (yellow) are defined by their rarity in a given ODD. We depict edge-case scenarios to be below a certain threshold value of the number of occurrences N. Boundary scenarios (blue) have criticality measure $\Gamma$ beyond a threshold value. Critical scenarios (green) are defined as scenarios that have the union of these two circumstances.
  • Figure 3: A architecture diagram mapping out the various components of CRITICAL. The framework first sets up an environment configuration based on typical real-world traffic from the highD dataset highDdataset. These configurations are then leveraged to generate HighwayEnv highway-env scenarios. At the end of each episode, we collect data including failure reports, risk metrics, and rewards, repeating this process multiple times to gather a collection of configuration files with associated scenario risk assessments. To enhance RL training, we analyze a distribution of configurations based on risk metrics, identifying those conducive to critical scenarios. We then either directly use these configurations for new scenarios or prompt an LLM to generate critical scenarios.
  • Figure 4: Training loss comparison for Vanilla PPO (Blue), PPO with Critical Case Generation (Orange), and PPO with Critical Case Generation and Large Language Model Analysis (Green).
  • Figure 5: The distributions on the right (green) are the configurations quantified by the $TTC$near miss counts, while the distributions on the left (blue) are the configurations by unified risk index $r$ threshold counts. The distributions along the top row are the baseline. The middle row corresponds to Critical Case Generation without LLM, and the bottom row is LLM-Enhanced Critical Case Generations.