Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation
Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, Panagiotis Angeloudis
TL;DR
CRITICAL presents a closed-loop framework that jointly generates safety-critical driving scenarios and trains AV agents, using highD-driven data, behavior clustering, surrogate risk metrics, and optional LLM-guided scenario refinement. The approach enhances sample efficiency and robustness by exposing PPO-based agents to diverse, risk-focused configurations in HighwayEnv, with TTC and a unified risk index guiding criticality assessment. Empirical results show faster learning, higher rewards, and fewer crashes, with further gains when incorporating LLM analysis via LangChain and Mistral-7B-Instruct. The work demonstrates the potential of integrating data-driven scenario generation with language-model reasoning to improve AV safety validation and accelerate RL-based development.
Abstract
This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.
