Reinforcement Learning Problem Solving with Large Language Models

Sina Gholamian; Domingo Huh

Reinforcement Learning Problem Solving with Large Language Models

Sina Gholamian, Domingo Huh

TL;DR

This work explores leveraging large language models as reinforcement learning agents by formulating RL problems as prompts to LLMs and using iterative prompting to learn policies via $Q$-Learning within an MDP framework. A structured Prompt Framework encodes $S$, $A$, $P$, $R$, and $\gamma$ into prompt components, enabling episode simulation and policy extraction directly from the LLM, with self-checks to ensure requirements are met. Two enterprise case studies, Research Scientist and Legal Matter Intake, demonstrate that the LLM can converge to near-optimal workflows (e.g., Start→...→End) under configured $\gamma$ values, often within few iterations. The results highlight the practical potential of language-driven RL for planning and workflow optimization, while discussing limitations such as model variability, scope, and ethical considerations. The work points toward future extensions including LLM-based planning, personalized prompts, and integration with enterprise planning tools.

Abstract

Large Language Models (LLMs) encapsulate an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of a variety of Natural Language Processing (NLP) tasks. This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems. However, one interesting avenue that shows untapped potential is the use of LLMs as Reinforcement Learning (RL) agents to enable conversational RL problem solving. Therefore, in this study, we explore the concept of formulating Markov Decision Process-based RL problems as LLM prompting tasks. We demonstrate how LLMs can be iteratively prompted to learn and optimize policies for specific RL tasks. In addition, we leverage the introduced prompting technique for episode simulation and Q-Learning, facilitated by LLMs. We then show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake" workflows.

Reinforcement Learning Problem Solving with Large Language Models

TL;DR

This work explores leveraging large language models as reinforcement learning agents by formulating RL problems as prompts to LLMs and using iterative prompting to learn policies via

-Learning within an MDP framework. A structured Prompt Framework encodes

, and

into prompt components, enabling episode simulation and policy extraction directly from the LLM, with self-checks to ensure requirements are met. Two enterprise case studies, Research Scientist and Legal Matter Intake, demonstrate that the LLM can converge to near-optimal workflows (e.g., Start→...→End) under configured

values, often within few iterations. The results highlight the practical potential of language-driven RL for planning and workflow optimization, while discussing limitations such as model variability, scope, and ethical considerations. The work points toward future extensions including LLM-based planning, personalized prompts, and integration with enterprise planning tools.

Abstract

Paper Structure (29 sections, 3 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Prior Art
Preliminaries
Q-Learning
Methodology
Prompt Framework
Context Setup.
Task.
Inputs.
Requirements.
Outputs.
Iterative Check.
Algorithm
Case Studies
Research Scientist Workflow
...and 14 more sections

Figures (5)

Figure 1: Research Scientist workflow.
Figure 2: Legal Matter Intake workflow.
Figure 3: MLLM agent for Conflict Assessment state.
Figure 4: This prompt is used to model the research scientist workflow based on the Markov Decision Process.
Figure 5: This prompt is used for iterative verification of the LLM outputs until reaching the desired outputs. Task, States, Actions, and Rewards are place-holders for use case specific values, e.g., values from Figure \ref{['prompt_all']} can be used here for research scientist workflow.

Reinforcement Learning Problem Solving with Large Language Models

TL;DR

Abstract

Reinforcement Learning Problem Solving with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)