Table of Contents
Fetching ...

Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving

Hao Pang, Zhenpo Wang, Guoqiang Li

TL;DR

The paper tackles the inefficiency of DRL in autonomous driving decision-making and heavy reliance on human guidance by introducing LGDRL, which integrates an LLM-based driving expert into the DRL loop. A Jensen-Shannon divergence-based policy constraint and an intermittent expert-intervened interaction mechanism are proposed to efficiently fuse expert guidance and preserve exploration. Experimental results on a highway driving scenario show LGDRL achieving high task success (≈90%) with improved learning efficiency and substantially faster inference than the LLM, while ablations confirm the critical role of the expert constraint. The approach offers a practical path to combine large language models with DRL for safer, more capable autonomous driving policies, with potential applicability to other complex driving scenarios.

Abstract

Deep reinforcement learning (DRL) shows promising potential for autonomous driving decision-making. However, DRL demands extensive computational resources to achieve a qualified policy in complex driving scenarios due to its low learning efficiency. Moreover, leveraging expert guidance from human to enhance DRL performance incurs prohibitively high labor costs, which limits its practical application. In this study, we propose a novel large language model (LLM) guided deep reinforcement learning (LGDRL) framework for addressing the decision-making problem of autonomous vehicles. Within this framework, an LLM-based driving expert is integrated into the DRL to provide intelligent guidance for the learning process of DRL. Subsequently, in order to efficiently utilize the guidance of the LLM expert to enhance the performance of DRL decision-making policies, the learning and interaction process of DRL is enhanced through an innovative expert policy constrained algorithm and a novel LLM-intervened interaction mechanism. Experimental results demonstrate that our method not only achieves superior driving performance with a 90\% task success rate but also significantly improves the learning efficiency and expert guidance utilization efficiency compared to state-of-the-art baseline algorithms. Moreover, the proposed method enables the DRL agent to maintain consistent and reliable performance in the absence of LLM expert guidance. The code and supplementary videos are available at https://bitmobility.github.io/LGDRL/.

Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving

TL;DR

The paper tackles the inefficiency of DRL in autonomous driving decision-making and heavy reliance on human guidance by introducing LGDRL, which integrates an LLM-based driving expert into the DRL loop. A Jensen-Shannon divergence-based policy constraint and an intermittent expert-intervened interaction mechanism are proposed to efficiently fuse expert guidance and preserve exploration. Experimental results on a highway driving scenario show LGDRL achieving high task success (≈90%) with improved learning efficiency and substantially faster inference than the LLM, while ablations confirm the critical role of the expert constraint. The approach offers a practical path to combine large language models with DRL for safer, more capable autonomous driving policies, with potential applicability to other complex driving scenarios.

Abstract

Deep reinforcement learning (DRL) shows promising potential for autonomous driving decision-making. However, DRL demands extensive computational resources to achieve a qualified policy in complex driving scenarios due to its low learning efficiency. Moreover, leveraging expert guidance from human to enhance DRL performance incurs prohibitively high labor costs, which limits its practical application. In this study, we propose a novel large language model (LLM) guided deep reinforcement learning (LGDRL) framework for addressing the decision-making problem of autonomous vehicles. Within this framework, an LLM-based driving expert is integrated into the DRL to provide intelligent guidance for the learning process of DRL. Subsequently, in order to efficiently utilize the guidance of the LLM expert to enhance the performance of DRL decision-making policies, the learning and interaction process of DRL is enhanced through an innovative expert policy constrained algorithm and a novel LLM-intervened interaction mechanism. Experimental results demonstrate that our method not only achieves superior driving performance with a 90\% task success rate but also significantly improves the learning efficiency and expert guidance utilization efficiency compared to state-of-the-art baseline algorithms. Moreover, the proposed method enables the DRL agent to maintain consistent and reliable performance in the absence of LLM expert guidance. The code and supplementary videos are available at https://bitmobility.github.io/LGDRL/.

Paper Structure

This paper contains 22 sections, 29 equations, 12 figures, 8 tables, 2 algorithms.

Figures (12)

  • Figure 1: Comparison of traditional DRL and the proposed LGDRL framework.
  • Figure 2: LLM guided deep reinforcement learning framework. Within this framework, an LLM driving expert is prompted to guide the learning process of the DRL agent. A novel expert policy constrained DRL algorithm, which integrates a policy constraint based on Jensen-Shannon (JS) divergence into the learning objective, is used to facilitate the DRL agent to learn more effectively from the expert guidance. The actions applied to the environment are determined by a novel LLM-intervened interaction mechanism, which allows the LLM expert to intervene in the DRL agent actions when necessary.
  • Figure 3: The LLM generates a textual response based on the prompts created by the prompt generator. The action extractor then extracts the corresponding action guidance from this response. A re-query mechanism within the action extractor is used to revise the response into a correct format.
  • Figure 4: The expert-intervened interaction mechanism allows the LLM expert to intermittently intervene in the interactions between the DRL agent and the environment based on the DRL action safety condition and the intervention permission condition.
  • Figure 5: The experimental scenario constructed by the highway-env simulator. The yellow vehicle represents the ego vehicle, the blue vehicles represent the surrounding vehicles. The shading represents the historical trajectory of the vehicles. The red dot represents the target point.
  • ...and 7 more figures