Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving
Hao Pang, Zhenpo Wang, Guoqiang Li
TL;DR
The paper tackles the inefficiency of DRL in autonomous driving decision-making and heavy reliance on human guidance by introducing LGDRL, which integrates an LLM-based driving expert into the DRL loop. A Jensen-Shannon divergence-based policy constraint and an intermittent expert-intervened interaction mechanism are proposed to efficiently fuse expert guidance and preserve exploration. Experimental results on a highway driving scenario show LGDRL achieving high task success (≈90%) with improved learning efficiency and substantially faster inference than the LLM, while ablations confirm the critical role of the expert constraint. The approach offers a practical path to combine large language models with DRL for safer, more capable autonomous driving policies, with potential applicability to other complex driving scenarios.
Abstract
Deep reinforcement learning (DRL) shows promising potential for autonomous driving decision-making. However, DRL demands extensive computational resources to achieve a qualified policy in complex driving scenarios due to its low learning efficiency. Moreover, leveraging expert guidance from human to enhance DRL performance incurs prohibitively high labor costs, which limits its practical application. In this study, we propose a novel large language model (LLM) guided deep reinforcement learning (LGDRL) framework for addressing the decision-making problem of autonomous vehicles. Within this framework, an LLM-based driving expert is integrated into the DRL to provide intelligent guidance for the learning process of DRL. Subsequently, in order to efficiently utilize the guidance of the LLM expert to enhance the performance of DRL decision-making policies, the learning and interaction process of DRL is enhanced through an innovative expert policy constrained algorithm and a novel LLM-intervened interaction mechanism. Experimental results demonstrate that our method not only achieves superior driving performance with a 90\% task success rate but also significantly improves the learning efficiency and expert guidance utilization efficiency compared to state-of-the-art baseline algorithms. Moreover, the proposed method enables the DRL agent to maintain consistent and reliable performance in the absence of LLM expert guidance. The code and supplementary videos are available at https://bitmobility.github.io/LGDRL/.
