Reinforcement Learning Problem Solving with Large Language Models
Sina Gholamian, Domingo Huh
TL;DR
This work explores leveraging large language models as reinforcement learning agents by formulating RL problems as prompts to LLMs and using iterative prompting to learn policies via $Q$-Learning within an MDP framework. A structured Prompt Framework encodes $S$, $A$, $P$, $R$, and $\gamma$ into prompt components, enabling episode simulation and policy extraction directly from the LLM, with self-checks to ensure requirements are met. Two enterprise case studies, Research Scientist and Legal Matter Intake, demonstrate that the LLM can converge to near-optimal workflows (e.g., Start→...→End) under configured $\gamma$ values, often within few iterations. The results highlight the practical potential of language-driven RL for planning and workflow optimization, while discussing limitations such as model variability, scope, and ethical considerations. The work points toward future extensions including LLM-based planning, personalized prompts, and integration with enterprise planning tools.
Abstract
Large Language Models (LLMs) encapsulate an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of a variety of Natural Language Processing (NLP) tasks. This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems. However, one interesting avenue that shows untapped potential is the use of LLMs as Reinforcement Learning (RL) agents to enable conversational RL problem solving. Therefore, in this study, we explore the concept of formulating Markov Decision Process-based RL problems as LLM prompting tasks. We demonstrate how LLMs can be iteratively prompted to learn and optimize policies for specific RL tasks. In addition, we leverage the introduced prompting technique for episode simulation and Q-Learning, facilitated by LLMs. We then show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake" workflows.
