Table of Contents
Fetching ...

SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC

Yue Deng, Yan Yu, Weiyu Ma, Zirui Wang, Wenhui Zhu, Jian Zhao, Yin Zhang

TL;DR

SMAC-HARD addresses benchmark saturation in MARL by introducing an editable, diverse opponent framework for SMAC, including opponent-script editing, randomized strategy mixing, and synchronized self-play interfaces. It combines an LLM-guided planning pipeline to generate opponent and agent decision trees, converted into pysc2-compatible scripts, with a black-box evaluation to test policy coverage on unseen adversaries. Experimental results show that leading MARL algorithms struggle under edited and mixed-opponent conditions, highlighting limited transferability when trained on a single strategy. By offering a robust, configurable testing ground and accompanying evaluation framework, SMAC-HARD aims to drive the development of more robust, self-play-oriented MARL methods with practical transferability to diverse opponents.

Abstract

The availability of challenging simulation environments is pivotal for advancing the field of Multi-Agent Reinforcement Learning (MARL). In cooperative MARL settings, the StarCraft Multi-Agent Challenge (SMAC) has gained prominence as a benchmark for algorithms following centralized training with decentralized execution paradigm. However, with continual advancements in SMAC, many algorithms now exhibit near-optimal performance, complicating the evaluation of their true effectiveness. To alleviate this problem, in this work, we highlight a critical issue: the default opponent policy in these environments lacks sufficient diversity, leading MARL algorithms to overfit and exploit unintended vulnerabilities rather than learning robust strategies. To overcome these limitations, we propose SMAC-HARD, a novel benchmark designed to enhance training robustness and evaluation comprehensiveness. SMAC-HARD supports customizable opponent strategies, randomization of adversarial policies, and interfaces for MARL self-play, enabling agents to generalize to varying opponent behaviors and improve model stability. Furthermore, we introduce a black-box testing framework wherein agents are trained without exposure to the edited opponent scripts but are tested against these scripts to evaluate the policy coverage and adaptability of MARL algorithms. We conduct extensive evaluations of widely used and state-of-the-art algorithms on SMAC-HARD, revealing the substantial challenges posed by edited and mixed strategy opponents. Additionally, the black-box strategy tests illustrate the difficulty of transferring learned policies to unseen adversaries. We envision SMAC-HARD as a critical step toward benchmarking the next generation of MARL algorithms, fostering progress in self-play methods for multi-agent systems. Our code is available at https://github.com/devindeng94/smac-hard.

SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC

TL;DR

SMAC-HARD addresses benchmark saturation in MARL by introducing an editable, diverse opponent framework for SMAC, including opponent-script editing, randomized strategy mixing, and synchronized self-play interfaces. It combines an LLM-guided planning pipeline to generate opponent and agent decision trees, converted into pysc2-compatible scripts, with a black-box evaluation to test policy coverage on unseen adversaries. Experimental results show that leading MARL algorithms struggle under edited and mixed-opponent conditions, highlighting limited transferability when trained on a single strategy. By offering a robust, configurable testing ground and accompanying evaluation framework, SMAC-HARD aims to drive the development of more robust, self-play-oriented MARL methods with practical transferability to diverse opponents.

Abstract

The availability of challenging simulation environments is pivotal for advancing the field of Multi-Agent Reinforcement Learning (MARL). In cooperative MARL settings, the StarCraft Multi-Agent Challenge (SMAC) has gained prominence as a benchmark for algorithms following centralized training with decentralized execution paradigm. However, with continual advancements in SMAC, many algorithms now exhibit near-optimal performance, complicating the evaluation of their true effectiveness. To alleviate this problem, in this work, we highlight a critical issue: the default opponent policy in these environments lacks sufficient diversity, leading MARL algorithms to overfit and exploit unintended vulnerabilities rather than learning robust strategies. To overcome these limitations, we propose SMAC-HARD, a novel benchmark designed to enhance training robustness and evaluation comprehensiveness. SMAC-HARD supports customizable opponent strategies, randomization of adversarial policies, and interfaces for MARL self-play, enabling agents to generalize to varying opponent behaviors and improve model stability. Furthermore, we introduce a black-box testing framework wherein agents are trained without exposure to the edited opponent scripts but are tested against these scripts to evaluate the policy coverage and adaptability of MARL algorithms. We conduct extensive evaluations of widely used and state-of-the-art algorithms on SMAC-HARD, revealing the substantial challenges posed by edited and mixed strategy opponents. Additionally, the black-box strategy tests illustrate the difficulty of transferring learned policies to unseen adversaries. We envision SMAC-HARD as a critical step toward benchmarking the next generation of MARL algorithms, fostering progress in self-play methods for multi-agent systems. Our code is available at https://github.com/devindeng94/smac-hard.

Paper Structure

This paper contains 22 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) Agents spawn at the Team 1 point and the opponent units spawn at Team 2 point. (b) The internal opponent script is defined in the SC2Map file. The opponent controls all the units to attack toward the Team 1 point. Order all units in (Any units in (Entire map) owned by player 2 matching Excluded: Missile, Dead, Hidden, with at most Any Amount) to (Attack targeting Team 1)(Replace Existing Orders).
  • Figure 2: (a) Three opponent Zealot units arrive at the agents' starting point and are stuck at that point. (b) MARL algorithms easily achieve high performance when the tricky strategy is explored.
  • Figure 3: The learning curve of the models from QMIX and MAPPO algorithms when facing 'attacking the nearest enemy' (N), 'attacking the weakest enemy' (W), and the 'randomly choosing from the two strategies' (M). The x-axis is the time steps (1e6) being evaluated and the y-axis is the average winning rate of 5 different seeds from 32 evaluation processes.
  • Figure 4: The unit information, map information, and task description serve as a system prompt and are passed to the planner. The planner plans strategy for both sides and the coders implement the strategy correspondingly. Then the python scripts are the red and blue side of SMAC-HARD to simulate. Finally, the critic module analyse the simulation results and provide refinement suggestions to the planner and the coders.
  • Figure 5: The overall architecture of our proposed SMAC-HARD, opponent decision script, self-play interface, and the SMAC, PySC2, StarCraft II modules.
  • ...and 2 more figures