Table of Contents
Fetching ...

LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving

Yiming Shu, Jiahui Xu, Jiwei Tang, Ruiyang Gao, Chen Sun

TL;DR

LA-RL addresses the safety-efficiency trade-off in autonomous highway driving by integrating a language-guided actor-critic with a safety-critical MPC-DCBF planner. It employs task-specific reward shaping and a slack-enabled planner to balance exploration with safety guarantees. Empirical results show LA-RL outperforms multiple state-of-the-art baselines, achieving up to ~20-30% higher success rates and 100% success in low-density scenarios, while maintaining stability and efficiency. This work demonstrates a practical approach to safe, interpretable, and proactive autonomous driving through language-guided decision making and formal safety constraints.

Abstract

Autonomous highway driving demands a critical balance between proactive, efficiency-seeking behavior and robust safety guarantees. This paper proposes Language Action-guided Reinforcement Learning (LA-RL) with Safety Guarantees, a novel framework that integrates the semantic reasoning of large language models (LLMs) into the actor-critic architecture with an improved safety layer. Within this framework, task-specific reward shaping harmonizes the dual objectives of maximizing driving efficiency and ensuring safety, guiding decision-making based on both environmental insights and clearly defined goals. To enhance safety, LA-RL incorporates a safety-critical planner that combines model predictive control (MPC) with discrete control barrier functions (DCBFs). This layer formally constrains the LLM-informed policy to a safe action set, employs a slack mechanism that enhances solution feasibility, prevents overly conservative behavior and allows for greater policy exploration without compromising safety. Extensive experiments demonstrate that it significantly outperforms several current state-of-the-art methods, offering a more adaptive, reliable, and robust solution for autonomous highway driving. Compared to existing SOTA, it achieves approximately 20$\%$ higher success rate than the knowledge graph (KG) based baseline and about 30$\%$ higher than the retrieval augmented generation (RAG) based baseline. In low-density environments, LA-RL achieves a 100$\%$ success rate. These results confirm its enhanced exploration of the state-action space and its ability to autonomously adopt more efficient, proactive strategies in complex, mixed-traffic highway environments.

LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving

TL;DR

LA-RL addresses the safety-efficiency trade-off in autonomous highway driving by integrating a language-guided actor-critic with a safety-critical MPC-DCBF planner. It employs task-specific reward shaping and a slack-enabled planner to balance exploration with safety guarantees. Empirical results show LA-RL outperforms multiple state-of-the-art baselines, achieving up to ~20-30% higher success rates and 100% success in low-density scenarios, while maintaining stability and efficiency. This work demonstrates a practical approach to safe, interpretable, and proactive autonomous driving through language-guided decision making and formal safety constraints.

Abstract

Autonomous highway driving demands a critical balance between proactive, efficiency-seeking behavior and robust safety guarantees. This paper proposes Language Action-guided Reinforcement Learning (LA-RL) with Safety Guarantees, a novel framework that integrates the semantic reasoning of large language models (LLMs) into the actor-critic architecture with an improved safety layer. Within this framework, task-specific reward shaping harmonizes the dual objectives of maximizing driving efficiency and ensuring safety, guiding decision-making based on both environmental insights and clearly defined goals. To enhance safety, LA-RL incorporates a safety-critical planner that combines model predictive control (MPC) with discrete control barrier functions (DCBFs). This layer formally constrains the LLM-informed policy to a safe action set, employs a slack mechanism that enhances solution feasibility, prevents overly conservative behavior and allows for greater policy exploration without compromising safety. Extensive experiments demonstrate that it significantly outperforms several current state-of-the-art methods, offering a more adaptive, reliable, and robust solution for autonomous highway driving. Compared to existing SOTA, it achieves approximately 20 higher success rate than the knowledge graph (KG) based baseline and about 30 higher than the retrieval augmented generation (RAG) based baseline. In low-density environments, LA-RL achieves a 100 success rate. These results confirm its enhanced exploration of the state-action space and its ability to autonomously adopt more efficient, proactive strategies in complex, mixed-traffic highway environments.

Paper Structure

This paper contains 30 sections, 27 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The overall framework of LA-RL. The framework combines LLM with RL, where the LLM serves as the language-informed decision-maker, processing language-guided information. The safety-critical planner integrates Model Predictive Control (MPC) and Discrete Control Barrier Functions (DCBF) to generate optimized control inputs, ensuring safe and robust operation with slack mechanism. Multiple reward components, including TTC, collision, speed, exploration, and overtaking rewards, are utilized to guide the autonomous vehicle's decision-making in various complex settings.
  • Figure 2: The format of the input prompt. It contains the task definition, traffic preference, available actions and current scenario, which helps the EV to understand the scenario and make environmentally informed decisions.
  • Figure 3: The Description of longitudinal DCBF and lateral DCBF within the range of ROI, which are critical components of safety-critical planner.
  • Figure 4: A comparison of multiple methods based on success steps, evaluated under different settings.
  • Figure 5: Snapshots of scenarios from various configuration settings.
  • ...and 1 more figures