Table of Contents
Fetching ...

Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios

Li Ma, Hao Peng, Yiming Wang, Hongbin Luo, Jie Liu, Kongjing Gu, Guanlin Wu, Hui Lin, Lei Ren

Abstract

Large language models (LLMs) have demonstrated exceptional potential in complex reasoning,pioneering a new paradigm for autonomous agent decision making in dynamic settings. However, in Real-Time Strategy (RTS) scenarios, LLMs suffer from a critical speed-quality trade-off. Specifically expansive state spaces and time limits render inference delays prohibitive, while stochastic planning errors undermine logical consistency. To address these challenges, we present SEMA (Self-Evolving Multi-Agent), a novel framework designed for high-performance, low-latency decision-making in RTS environments. This collaborative multi-agent framework facilitates self-evolution by adaptively calibrating model bias through in-episode assessment and cross-episode analysis. We further incorporate dynamic observation pruning based on structural entropy to model game states topologically. By distilling high dimensional data into core semantic information, this approach significantly reduces inference time. We also develop a hybrid knowledge-memory mechanism that integrates micro-trajectories, macro-experience, and hierarchical domain knowledge, thereby enhancing both strategic adaptability and decision consistency. Experiments across multiple StarCraft II maps demonstrate that SEMA achieves superior win rates while reducing average decision latency by over 50%, validating its efficiency and robustness in complex RTS scenarios.

Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios

Abstract

Large language models (LLMs) have demonstrated exceptional potential in complex reasoning,pioneering a new paradigm for autonomous agent decision making in dynamic settings. However, in Real-Time Strategy (RTS) scenarios, LLMs suffer from a critical speed-quality trade-off. Specifically expansive state spaces and time limits render inference delays prohibitive, while stochastic planning errors undermine logical consistency. To address these challenges, we present SEMA (Self-Evolving Multi-Agent), a novel framework designed for high-performance, low-latency decision-making in RTS environments. This collaborative multi-agent framework facilitates self-evolution by adaptively calibrating model bias through in-episode assessment and cross-episode analysis. We further incorporate dynamic observation pruning based on structural entropy to model game states topologically. By distilling high dimensional data into core semantic information, this approach significantly reduces inference time. We also develop a hybrid knowledge-memory mechanism that integrates micro-trajectories, macro-experience, and hierarchical domain knowledge, thereby enhancing both strategic adaptability and decision consistency. Experiments across multiple StarCraft II maps demonstrate that SEMA achieves superior win rates while reducing average decision latency by over 50%, validating its efficiency and robustness in complex RTS scenarios.
Paper Structure (18 sections, 9 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 9 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: The SEMA framework addresses two pivotal challenges for LLMs in RTS environments. First, the massive observational data leads to excessive input sequences, escalating reasoning latency and hindering real-time response. Second, the inherent stochastic bias of LLMs induces inconsistent decision logic, even in the exact same scenario, two distinct decisions can yield diametrically opposed outcomes, such as a shift from victory to defeat. This volatility severely undermines the robustness of agents in complex adversarial settings.
  • Figure 2: Overview of SEMA. First, structural modeling and dynamic pruning are employed to extract core observations, reducing reasoning latency. Second, decision and evaluation agents perform closed-loop calibration via history retrieval to suppress stochastic bias. Finally, the policy agent analyzes episode performance and updates experience, driving the continuous self-evolution of strategic logic.
  • Figure 3: Performance comparison of win rate and response time across diverse scenarios. (a) 3m; (b) 8m; (c) 25m; (d) Flat64 (VeryEasy); (e) Flat64 (Easy). For each point, the values in parentheses $(x, y)$, e.g., $(0.49, 88)$, denote the average response time $(s)$ and the success rate $(\%)$.
  • Figure 4: Ablation study of structural entropy driven pruning on response time across different maps and different levels. (a) 3m; (b) 8m; (c) Flat64(VeryEasy); (d) Flat64(Easy).
  • Figure 5: Hyperparameter sensitivity analysis and semantic filtering visualization on the Simple64-Lv2 map. (a) and (b) (at frame 30 and frame 200, respectively) visualize the structural entropy-driven dynamic pruning process of observation information, where red regions indicate observation attribute nodes with $\Delta H$ below the threshold $\mu$ that require pruning, while green regions represent retained attribute nodes; (c) shows the relationship between average win rate and the number of tokens under different capacity factors $N$.
  • ...and 1 more figures