Table of Contents
Fetching ...

Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

Xiaojie Xu, Zongyuan Li, Chang Lu, Runnan Qi, Yanan Ni, Lumin Jiang, Xiangbei Liu, Xuebo Zhang, Yongchun Fang, Kuihua Huang, Xian Guo, Zhanghua Wu, Zhenya Li

TL;DR

The paper addresses the challenge of enabling large language models to learn and improve strategic play in a complex RTS environment. It introduces Reflection of Episodes (ROE), a framework that uses game-phase based keyframe selection, expert and self reflections, and an iterative prompt-rewrite loop to evolve strategy across episodes. Empirical results in TextStarCraft II show ROE achieving superior win rates over a COSBaseline, with ablation experiments confirming the importance of keyframe selection and reflection components and indicating partial interpretability of the learned strategies. The work offers a practical approach to improving LLM-based decision making in dynamic environments and lays groundwork for more transparent, self-improving AI agents in RTS settings.

Abstract

StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.

Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

TL;DR

The paper addresses the challenge of enabling large language models to learn and improve strategic play in a complex RTS environment. It introduces Reflection of Episodes (ROE), a framework that uses game-phase based keyframe selection, expert and self reflections, and an iterative prompt-rewrite loop to evolve strategy across episodes. Empirical results in TextStarCraft II show ROE achieving superior win rates over a COSBaseline, with ablation experiments confirming the importance of keyframe selection and reflection components and indicating partial interpretability of the learned strategies. The work offers a practical approach to improving LLM-based decision making in dynamic environments and lays groundwork for more transparent, self-improving AI agents in RTS settings.

Abstract

StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.

Paper Structure

This paper contains 17 sections, 13 figures, 2 tables, 2 algorithms.

Figures (13)

  • Figure 1: StarCraft II. A complex and dynamic real-time strategy game environment, which make it very suitable for artificial intelligence research.
  • Figure 2: Reflection of Episodes Framework. The framework consists of Text StarCraft 2 environment and reflection structure. After an episode, reflection structure generate new prompt and update it to the next game.
  • Figure 3: Reflection and Strategy Iteration. In three consecutive games, expert reflection and two generations of self-reflection took the game from defeat to victory.
  • Figure 4: Detailed game analysis of Very Hard difficulty (a) At 1:36 we sent the first reconnaissance plane to detect the enemy; (b) At 1:54 we build our first sub-Nexus; (c) At 5:42 we set up an outpost to respond to the enemy attack; (d) At 9:24, we engaged the enemy in the first major confrontation; (e) At 12 '12 we sent troops to attack the enemy Nexus; (f) At 15:10, we win the game by defeating all enemy Nexus
  • Figure 5: Comparison of baseline experiments in resources
  • ...and 8 more figures