Table of Contents
Fetching ...

LLM-PySC2: Starcraft II learning environment for Large Language Models

Zongyuan Li, Yanan Ni, Runnan Qi, Lumin Jiang, Chang Lu, Xiaojie Xu, Xiangbei Liu, Pengfei Li, Yunzheng Guo, Zhe Ma, Huanyu Li, Hui Wu, Xian Guo, Kuihua Huang, Xuebo Zhang

TL;DR

The paper tackles the challenge of enabling large language models (LLMs) to make decisions in StarCraft II by introducing LLM-PySC2, an environment that exposes the full pysc2 action space, rich multi-modal observations, and a native multi-agent framework. It presents an asynchronous query architecture and task-specific Wiki knowledge integration to support LLM-based planning, learning, and collaboration. Through macro-decision and micro-operation experiments, the authors show that while LLMs have zero-shot decision-making potential, their performance remains inconsistent due to insufficient domain knowledge and hallucinations, necessitating task-aware instructions and deployment-time learning. Overall, LLM-PySC2 provides a scalable platform to probe and advance LLM-based decision-making in highly complex, multi-agent environments, guiding future research toward more robust, knowledge-grounded planning systems.

Abstract

The tremendous potential has been demonstrated by large language models (LLMs) in intelligent decision-making problems, with unprecedented capabilities shown across diverse applications ranging from gaming AI systems to complex strategic planning frameworks. However, the StarCraft II platform, which has been widely adopted for validating decision-making algorithms in the past decade, has not yet provided substantial support for this emerging domain. To address issues that LLMs cannot interface with the hundreds of actions of the pysc2 backend and the lack of native support for multi-agent (MA) collaboration, we propose the LLM-PySC2 environment. This is the first environment that offers LLMs the complete pysc2 action space with sufficient multi-modal information and game Wiki knowledge. With an asynchronous query architecture, the environment efficiently interacts with LLMs that maintain a constant latency regardless of the scale of the agents' population. In the experiments, we evaluated LLMs' decision-making performance in both the macro-decision and micro-operation scenarios, with traditional StarCraft II Multi-Agent Challenge (SMAC) tasks and a series of new proposed. Results indicate that LLMs possess the potential to achieve victories in complex scenarios but cannot constantly generate correct decisions, especially in the recovered pysc2 action space and MA settings. Without task-relevant instructions, the pre-trained models suffer from issues such as hallucinations and inefficient collaboration. Our findings suggest that StarCraft II still challenges in the era of large models, revealing that there is a lot to do to develop an advanced LLM decision-making system, and the proposed LLM-PySC2 environment will support future development of LLM-based decision-making solutions.

LLM-PySC2: Starcraft II learning environment for Large Language Models

TL;DR

The paper tackles the challenge of enabling large language models (LLMs) to make decisions in StarCraft II by introducing LLM-PySC2, an environment that exposes the full pysc2 action space, rich multi-modal observations, and a native multi-agent framework. It presents an asynchronous query architecture and task-specific Wiki knowledge integration to support LLM-based planning, learning, and collaboration. Through macro-decision and micro-operation experiments, the authors show that while LLMs have zero-shot decision-making potential, their performance remains inconsistent due to insufficient domain knowledge and hallucinations, necessitating task-aware instructions and deployment-time learning. Overall, LLM-PySC2 provides a scalable platform to probe and advance LLM-based decision-making in highly complex, multi-agent environments, guiding future research toward more robust, knowledge-grounded planning systems.

Abstract

The tremendous potential has been demonstrated by large language models (LLMs) in intelligent decision-making problems, with unprecedented capabilities shown across diverse applications ranging from gaming AI systems to complex strategic planning frameworks. However, the StarCraft II platform, which has been widely adopted for validating decision-making algorithms in the past decade, has not yet provided substantial support for this emerging domain. To address issues that LLMs cannot interface with the hundreds of actions of the pysc2 backend and the lack of native support for multi-agent (MA) collaboration, we propose the LLM-PySC2 environment. This is the first environment that offers LLMs the complete pysc2 action space with sufficient multi-modal information and game Wiki knowledge. With an asynchronous query architecture, the environment efficiently interacts with LLMs that maintain a constant latency regardless of the scale of the agents' population. In the experiments, we evaluated LLMs' decision-making performance in both the macro-decision and micro-operation scenarios, with traditional StarCraft II Multi-Agent Challenge (SMAC) tasks and a series of new proposed. Results indicate that LLMs possess the potential to achieve victories in complex scenarios but cannot constantly generate correct decisions, especially in the recovered pysc2 action space and MA settings. Without task-relevant instructions, the pre-trained models suffer from issues such as hallucinations and inefficient collaboration. Our findings suggest that StarCraft II still challenges in the era of large models, revealing that there is a lot to do to develop an advanced LLM decision-making system, and the proposed LLM-PySC2 environment will support future development of LLM-based decision-making solutions.

Paper Structure

This paper contains 21 sections, 4 equations, 41 figures, 20 tables, 2 algorithms.

Figures (41)

  • Figure 1: Contributions of LLM-PySC2 environment. LLM-PySC2 is the first LLM decision-making platform that supports the complete pysc2 action space. With multi-modal observation and a native multi-agent system, this environment provides supports for researches such as LLM-based planning, learning and multi-modal information processing, with enough complexity in evaluation scenatios.
  • Figure 2: LLM-PySC2 framwork. In LLM-PySC2, the original PySC2 observation will transform into a text-form or multi-modal observation. LLM-generated text action can be recognized and transformed into PySC2 functions, enabling LLMs to interact with the StarCraft II environment and control the units.
  • Figure 3: Protoss action space and the recognition process. LLM-PySC2 is the first LLM decision-making environment with complete pysc2 action space. LLM controls units by output actions in the shape of <Action_Name(args)>. The environment transforms text action into pysc2 functions according to a transform protocol and the relevant bridge object of the action.
  • Figure 4: LLM-PySC2 observations. LLM-PySC2 provides multi-modal observation. The observation wrapper generates text and image observations that contain all the important information for decision-making, with access to images of the screen, minimap, and all the pysc2 original feature maps.
  • Figure 5: LLM-PySC2 multi-agent system. In LLM-PySC2, game control is divided into combat((1), (3)) and development((2), (4)). In standard unit control mode, the agent Commander sends messages to agents named CombatGroupi, and the CombatGroup agents control their units moving, attacking, or using skills to achieve tasks assigned by superiors. In standard build mode, the agent Developer trains units, updates technologies, and asks the agent Builder to build buildings. Then the Builder controls workers and chooses positions to construct new buildings.
  • ...and 36 more figures