StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments
Sean Kulinski, Nicholas R. Waytowich, James Z. Hare, David I. Inouye
TL;DR
StarCraftImage provides a scalable, easy-to-use benchmark for prototyping spatial reasoning in multi-agent environments by converting 60,000 human StarCraft II replays into 3.6 million summary images across hyperspectral, RGB, and grayscale representations. The pipeline preserves rich unit-level metadata and supports a suite of tasks from unit-type identification to movement prediction and outcome classification, while offering data-corruption models and domain-adaptation modifiers to stress-test methods. Benchmark evaluations on clean and corrupted data demonstrate the dataset’s utility for evaluating robust spatial reasoning models, and a preliminary transfer study to the DOTA satellite-imagery domain suggests real-world relevance. By releasing code, loaders, and licensing under permissive terms, StarCraftImage lowers the barrier to rapid experimentation and reproducibility in multi-agent spatial reasoning research.
Abstract
Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping spatial reasoning methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com
