Table of Contents
Fetching ...

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Yuji Cao, Huan Zhao, Yuheng Cheng, Ting Shu, Yue Chen, Guolong Liu, Gaoqi Liang, Junhua Zhao, Jinyue Yan, Yun Li

TL;DR

This survey defines LLM-enhanced RL as integrating large language models into the classical RL loop to address sample efficiency, reward design, generalization, and long-horizon planning. It proposes a four-role taxonomy—information processor, reward designer, decision-maker, and generator—to systematically categorize how LLMs can assist RL across perception, policy shaping, action, and world-model generation. The paper analyzes representative methods per role, discusses practical applications (e.g., robotics, autonomous driving, energy systems), and outlines opportunities and risks, including biases, safety, and computational demands. By mapping capabilities to RL challenges, it provides a framework and roadmap for future research aiming to deploy LLMs to ground and ground RL in real-world, multimodal environments.

Abstract

With extensive pre-trained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared to conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated, and provide insights into future directions. Lastly, a comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications such as robotics, autonomous driving, and energy systems.

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

TL;DR

This survey defines LLM-enhanced RL as integrating large language models into the classical RL loop to address sample efficiency, reward design, generalization, and long-horizon planning. It proposes a four-role taxonomy—information processor, reward designer, decision-maker, and generator—to systematically categorize how LLMs can assist RL across perception, policy shaping, action, and world-model generation. The paper analyzes representative methods per role, discusses practical applications (e.g., robotics, autonomous driving, energy systems), and outlines opportunities and risks, including biases, safety, and computational demands. By mapping capabilities to RL challenges, it provides a framework and roadmap for future research aiming to deploy LLMs to ground and ground RL in real-world, multimodal environments.

Abstract

With extensive pre-trained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared to conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated, and provide insights into future directions. Lastly, a comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications such as robotics, autonomous driving, and energy systems.
Paper Structure (47 sections, 2 equations, 6 figures, 3 algorithms)

This paper contains 47 sections, 2 equations, 6 figures, 3 algorithms.

Figures (6)

  • Figure 1: Classical reinforcement learning paradigm.
  • Figure 2: Framework of LLM-enhanced RL in classical Agent-Environment interactions, where LLM plays different roles in enhancing RL.
  • Figure 3: LLM as an information processor. (i) Feature Representation Extractor: frozen/fine-tuned LLM extracts meaningful representations for downstream RL networks. In the fine-tuning process, given observation ($\mathcal{O}_t$), invariant feature abstraction ($\Tilde{S}_t$) is learned with the contrastive loss ($\mathcal{L}^{c}_t$). Then, the invariant is fed into the actor-critic network. After fine-tuning, given different observations ($O_t$) and ($O'_t$) with appearance variation, the extracted representation is invariant, leading to robust RL performance. (ii) Language Translator: LLM interprets diverse natural language inputs, converting them into a standardized, task-specific format that the RL agent can efficiently process and act upon.
  • Figure 4: LLM as a reward designer. (i) Implicit Reward Model: LLMs provide rewards through direct prompting or alignment scoring between language instructions and visual observations. (ii) Explicit Reward Model: LLMs generate executable code for reward functions, with potential for self-refinement through evaluation loops.
  • Figure 5: LLM as a decision-maker. (i) Action-Making: given a $T$-length trajectory $\tau = (\hat{R}_{1}, s_{1}, a_{1}, \dots, \hat{R}_{T}, s_{T}, a_{T})$ as a sequence of ordered return-to-go $\hat{R}$, action $a$, and states $s$, LLM learns to predict future action $a'_t$ by minimizing the mean squared error loss $\mathcal{L} = \sum_{t}\left\|a_t-a_t^{\prime}\right\|_2^2$. (ii) Action-Guiding: LLM generates a reduced set of action candidates for agents or generates expert actions to regularize RL learning.
  • ...and 1 more figures