Large Language Models as Agents in Two-Player Games
Yang Liu, Peng Sun, Hang Li
TL;DR
This paper reframes LLM training and alignment as learning within two-player language-based games, unifying pre-training, SFT, RLHF, prompting, and in-context learning under a game-theoretic, extensive-form framework. By mapping each training stage to agent-learning concepts and analyzing environments as zero-sum, cooperative, or mixed games, it offers a principled lens on data design, reward shaping, and long-horizon reasoning to mitigate hallucination and improve robustness. The authors discuss data structuring (e.g., Q-A, Q-C-A), meta-learning across tasks, and world-model considerations, and they extend the framework to adversarial and cooperative settings, including superhuman aspirations and red-teaming. The work aims to guide future research in LLM alignment, safety, and capability enhancement by bridging GT/RL/MAS insights with practical LLM training and prompting strategies, while highlighting open questions and potential societal impacts.
Abstract
By formally defining the training processes of large language models (LLMs), which usually encompasses pre-training, supervised fine-tuning, and reinforcement learning with human feedback, within a single and unified machine learning paradigm, we can glean pivotal insights for advancing LLM technologies. This position paper delineates the parallels between the training methods of LLMs and the strategies employed for the development of agents in two-player games, as studied in game theory, reinforcement learning, and multi-agent systems. We propose a re-conceptualization of LLM learning processes in terms of agent learning in language-based games. This framework unveils innovative perspectives on the successes and challenges in LLM development, offering a fresh understanding of addressing alignment issues among other strategic considerations. Furthermore, our two-player game approach sheds light on novel data preparation and machine learning techniques for training LLMs.
