HCRMP: A LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen, Bo Leng, Zhuoren Li, Hanming Deng, Guizhe Jin, Ran Yu, Huanxi Wen
TL;DR
This work addresses the vulnerability of autonomous driving systems that rely on large language models (LLMs) by proposing a LLM-Hinted Reinforcement Learning paradigm. The core idea is to decouple LLM outputs from direct policy control and instead use the LLM to provide semantic hints that augment state information and influence policy optimization through a multi-critic framework, thereby mitigating hallucination effects. The HCRMP architecture comprises three modules: Augmented Semantic Representation (ASR) to expand the state with LLM-derived semantics, Contextual Stability Anchor (CSA) to produce reliable multi-critic weights via retrieval-augmented knowledge, and Semantic Cache Module (SCM) to handle LLM latency with a historical semantic memory. Empirical results in CARLA Town 2 show that HCRMP achieves high task success rates (up to 80.3%) across varied traffic densities and significantly reduces collisions (11.4%) in safety-critical scenarios, outperforming baseline LLM-Dominated RL methods. These findings demonstrate that a weakly coupled LLM-RL system can exploit LLM strengths in reasoning and context while preserving the RL agent’s autonomous, stable learning for robust autonomous driving.
Abstract
Integrating Large Language Models (LLMs) with Reinforcement Learning (RL) can enhance autonomous driving (AD) performance in complex scenarios. However, current LLM-Dominated RL methods over-rely on LLM outputs, which are prone to hallucinations. Evaluations show that state-of-the-art LLM indicates a non-hallucination rate of only approximately 57.95% when assessed on essential driving-related tasks. Thus, in these methods, hallucinations from the LLM can directly jeopardize the performance of driving policies. This paper argues that maintaining relative independence between the LLM and the RL is vital for solving the hallucinations problem. Consequently, this paper is devoted to propose a novel LLM-Hinted RL paradigm. The LLM is used to generate semantic hints for state augmentation and policy optimization to assist RL agent in motion planning, while the RL agent counteracts potential erroneous semantic indications through policy learning to achieve excellent driving performance. Based on this paradigm, we propose the HCRMP (LLM-Hinted Contextual Reinforcement Learning Motion Planner) architecture, which is designed that includes Augmented Semantic Representation Module to extend state space. Contextual Stability Anchor Module enhances the reliability of multi-critic weight hints by utilizing information from the knowledge base. Semantic Cache Module is employed to seamlessly integrate LLM low-frequency guidance with RL high-frequency control. Extensive experiments in CARLA validate HCRMP's strong overall driving performance. HCRMP achieves a task success rate of up to 80.3% under diverse driving conditions with different traffic densities. Under safety-critical driving conditions, HCRMP significantly reduces the collision rate by 11.4%, which effectively improves the driving performance in complex scenarios.
