Table of Contents
Fetching ...

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning

Sheila Schoepp, Masoud Jafaripour, Yingyue Cao, Tianpei Yang, Fatemeh Abdollahi, Shadan Golestan, Zahin Sufiyan, Osmar R. Zaiane, Matthew E. Taylor

TL;DR

The paper surveys the integration of LLMs and VLMs with RL, addressing critical challenges in prior knowledge, long-horizon planning, and reward design. It introduces a unifying taxonomy with three roles—Agent, Planner, and Reward—and reviews representative works in each category, highlighting parametric and non-parametric agent approaches, comprehensive and incremental planning, and language-/vision-guided reward design. Key contributions include synthesizing methods, clarifying trade-offs, and outlining open problems such as grounding, bias mitigation, and multimodal representations. By consolidating existing research and suggesting directions, the paper provides a framework to advance RL that leverages natural language and visual understanding for sequential decision-making.

Abstract

Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making.

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning

TL;DR

The paper surveys the integration of LLMs and VLMs with RL, addressing critical challenges in prior knowledge, long-horizon planning, and reward design. It introduces a unifying taxonomy with three roles—Agent, Planner, and Reward—and reviews representative works in each category, highlighting parametric and non-parametric agent approaches, comprehensive and incremental planning, and language-/vision-guided reward design. Key contributions include synthesizing methods, clarifying trade-offs, and outlining open problems such as grounding, bias mitigation, and multimodal representations. By consolidating existing research and suggesting directions, the paper provides a framework to advance RL that leverages natural language and visual understanding for sequential decision-making.

Abstract

Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making.

Paper Structure

This paper contains 23 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: A taxonomy for - and -assisted .
  • Figure 2: LLM/VLM as Agent.
  • Figure 3: LM as Planner.
  • Figure 4: LLM/VLM as Reward