Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

Sourav Panda; Shreyash Kale; Tanmay Ambadkar; Abhinav Verma; Jonathan Dodge

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

Sourav Panda, Shreyash Kale, Tanmay Ambadkar, Abhinav Verma, Jonathan Dodge

TL;DR

The Two-Bridge Map Suite is purposely engineered as an intermediate benchmark to sit between StarCraft IIs full game and mini-games, and preliminary experiments show that agents learn coherent maneuvering and engagement behaviors without imposing full-game computational costs.

Abstract

The research community lacks a middle ground between StarCraft IIs full game and its mini-games. The full-games sprawling state-action space renders reward signals sparse and noisy, but in mini-games simple agents saturate performance. This complexity gap hinders steady curriculum design and prevents researchers from experimenting with modern Reinforcement Learning algorithms in RTS environments under realistic compute budgets. To fill this gap, we present the Two-Bridge Map Suite, the first entry in an open-source benchmark series we purposely engineered as an intermediate benchmark to sit between these extremes. By disabling economy mechanics such as resource collection, base building, and fog-of-war, the environment isolates two core tactical skills: long-range navigation and micro-combat. Preliminary experiments show that agents learn coherent maneuvering and engagement behaviors without imposing full-game computational costs. Two-Bridge is released as a lightweight, Gym-compatible wrapper on top of PySC2, with maps, wrappers, and reference scripts fully open-sourced to encourage broad adoption as a standard benchmark.

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

TL;DR

Abstract

Paper Structure (25 sections, 12 figures)

This paper contains 25 sections, 12 figures.

Introduction
Our Contribution
Two Bridge Map Suite
Core Map Components
Game Setting and Episode Semantics
Map Variations and Strategic Diagnostics
Experiments and Training Recipes
Experiment 1: Pilot Training Strategy
Experiment 2: Action Masking
Experiment 3: Camera Lock
Training Protocol and Compute Budget
Results and Qualitative Analysis
Experiment 2: Qualitative Behavior
Experiment 3: Qualitative Behavior
Discussion, Limitations, and Future Work
...and 10 more sections

Figures (12)

Figure 1: The benchmark gap in StarCraft II. Full-game supports highly complex strategic behavior but requires extreme compute and complex, replay-driven training pipelines, making it infeasible for most researchers. In contrast, SC2 mini-games and SMAC are lightweight, and reproducible under limited compute budgets, but isolate narrow skills with limited strategic depth. The proposed Two-Bridge Map Suite bridges the gap between mini-games and full-game SC2 by introducing intermediate decision horizons and objective structure, while remaining interactive, replay-free, reproducible, and accessible under standard compute budgets.
Figure 2: Two-Bridge Map overview and variability. (Left) Map layout with impassable terrain and predefined spawn regions. (Middle) Representative episode initializations showing different placements of unit groups and the beacon within the predefined spawn regions. (Right) Objectives, either navigation toward a beacon (top) or combat against enemy units (bottom), within a 5-minute time limit.
Figure 3: Two-Bridge map variants and strategic diagnostics. Each map is defined by the cross-product of a layout-induced proximity bias and a unit-count regime. Takeaway: Each resulting configuration, poses a strategic diagnostic that probes how agents arbitrate between objectives when structural affordances and numerical incentives are placed in tension.
Figure 4: Camera-lock observation setup used in Experiment 3. Left (legend): main screen view vs minimap view. Middle (camera free): main screen view does not track unit movement. Right (camera lock): main screen view centered on the friendly unit group at every timestep. Takeaway: Camera locking ensures screen-level observations consistently contain relevant units and nearby terrain without introducing camera-control actions.
Figure 5: Terminal outcome distributions for Experiment 2 (top) and Experiment 3 (bottom) across all variants. Each plot corresponds to a variant layout and aggregates outcomes across all three unit-count settings. Colors indicate map difficulty: V1 (green), V2 (orange), and V3 (red).
...and 7 more figures

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

TL;DR

Abstract

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

Authors

TL;DR

Abstract

Table of Contents

Figures (12)