Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat

Joseph Emmanuel DL Dayo; Michel Onasis S. Ogbinar; Prospero C. Naval

Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat

Joseph Emmanuel DL Dayo, Michel Onasis S. Ogbinar, Prospero C. Naval

TL;DR

This work integrates Large Language Models (LLMs) into a Reinforcement Learning (RL) framework by placing a sophisticated LLM-controlled adversary into Dungeons & Dragons 5th Edition combat to challenge small RL agents trained with Deep Q-Networks. A D&D 5E game engine provides a 7×7 observation space, a pre-generated valid-action set, and a prompting-based interface that converts game states into LLM-friendly inputs; the Bellman update $Q(s, a) \leftarrow Q(s, a) + \alpha ( r + \gamma \max_{a'} Q(s', a') - Q(s, a) )$ governs the RL updates, while LLMs such as GPT-4o, LLaMA 3, and Mistral drive adversarial strategies. Results show that RL agents trained against LLM adversaries often converge faster and achieve higher rewards, and in multi-class scenarios RL methods tend to dominate while LLMs learn to adapt as teachers, albeit with slower real-time decision-making. The study contributes an open-source, LLM-augmented RL testbed for strategic decision-making in complex, rule-based environments, with implications for educational simulations and AI-driven interactive systems. Future work points to optimizing prompting strategies, reducing inference latency, extending to multi-agent collaboration, and exploring hybrid RL-LLM models to further enhance robustness and adaptability.

Abstract

The objective of this study is to design and implement a reinforcement learning (RL) environment using D\&D 5E combat scenarios to challenge smaller RL agents through interaction with a robust adversarial agent controlled by advanced Large Language Models (LLMs) like GPT-4o and LLaMA 3 8B. This research employs Deep Q-Networks (DQN) for the smaller agents, creating a testbed for strategic AI development that also serves as an educational tool by simulating dynamic and unpredictable combat scenarios. We successfully integrated sophisticated language models into the RL framework, enhancing strategic decision-making processes. Our results indicate that while RL agents generally outperform LLM-controlled adversaries in standard metrics, the strategic depth provided by LLMs significantly enhances the overall AI capabilities in this complex, rule-based setting. The novelty of our approach and its implications for mastering intricate environments and developing adaptive strategies are discussed, alongside potential innovations in AI-driven interactive simulations. This paper aims to demonstrate how integrating LLMs can create more robust and adaptable AI systems, providing valuable insights for further research and educational applications.

Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat

TL;DR

Abstract

Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)