Table of Contents
Fetching ...

StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments

Sean Kulinski, Nicholas R. Waytowich, James Z. Hare, David I. Inouye

TL;DR

StarCraftImage provides a scalable, easy-to-use benchmark for prototyping spatial reasoning in multi-agent environments by converting 60,000 human StarCraft II replays into 3.6 million summary images across hyperspectral, RGB, and grayscale representations. The pipeline preserves rich unit-level metadata and supports a suite of tasks from unit-type identification to movement prediction and outcome classification, while offering data-corruption models and domain-adaptation modifiers to stress-test methods. Benchmark evaluations on clean and corrupted data demonstrate the dataset’s utility for evaluating robust spatial reasoning models, and a preliminary transfer study to the DOTA satellite-imagery domain suggests real-world relevance. By releasing code, loaders, and licensing under permissive terms, StarCraftImage lowers the barrier to rapid experimentation and reproducibility in multi-agent spatial reasoning research.

Abstract

Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping spatial reasoning methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com

StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments

TL;DR

StarCraftImage provides a scalable, easy-to-use benchmark for prototyping spatial reasoning in multi-agent environments by converting 60,000 human StarCraft II replays into 3.6 million summary images across hyperspectral, RGB, and grayscale representations. The pipeline preserves rich unit-level metadata and supports a suite of tasks from unit-type identification to movement prediction and outcome classification, while offering data-corruption models and domain-adaptation modifiers to stress-test methods. Benchmark evaluations on clean and corrupted data demonstrate the dataset’s utility for evaluating robust spatial reasoning models, and a preliminary transfer study to the DOTA satellite-imagery domain suggests real-world relevance. By releasing code, loaders, and licensing under permissive terms, StarCraftImage lowers the barrier to rapid experimentation and reproducibility in multi-agent spatial reasoning research.

Abstract

Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping spatial reasoning methods. All datasets, code for extraction, and code for dataset loading can be found at https://starcraftdata.davidinouye.com
Paper Structure (41 sections, 15 figures, 7 tables)

This paper contains 41 sections, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Two samples (one per row) showing (Blue box/left) our 64 x 64 StarCraftHyper dataset which contains all unit IDs and corresponding values for both players (color for unit IDs denotes categorical unit ids), (Green box/middle) StarCraftCIFAR10 (32 x 32) which is easy to interpret where blue is player 1, red is player 2, and green are neutral units such as terrain or resources, and (Orange box/right) StarCraftMNIST (28 x 28) which are grayscale images further simplified to show player 1 as light-gray, player 2 as dark-gray, and neutral as medium-level shades of gray.
  • Figure 2: An overview of our hyperspectral dataset from different perspectives. The raw image data is stored in texttt.png files using the bag-of-units representation. A logical view of the dataset is a (sparse) hyperspectral image with many channels that include unit information and visibility per player, resource information (neutral units), and map information. The bag-of-units representation enables processing this very high-dimensional dataset using dense matrices only and leveraging embedding layers that are often used for processing sequences of IDs; importantly, because the unit order does not matter, an order-invariant reduction such as max or sum should be used to arrive at a representation with a fixed number of embedding channels $E$.
  • Figure 3: (Top) We embed the unit information of player 1, player 2, and neutral separately using an embedding of size 1. We then combine with other dense features (visibility for players and terrain info for neutral). Finally, we concatenate each output into a 3-channel 32x32 px RGB image where the neutral channel is down-weighted for visual clarity. (Bottom) We take the RGB color image, rescale the values of each channel, and overlay each channel into a single grayscale 28x28 px image where precedence is given to P1, then P2, and finally neutral or background. We use precedence combinations as linear combinations of the layers could lead to unit information being canceled.
  • Figure 4: (Left) Our 64 x 64 StarCraftHyper dataset contains all unit IDs and corresponding values for both players (color for unit IDs denotes categorical unit ids) where visibility is player specific but the terrain and pathing grid are shared (a few other layers are not shown, see appendix). (Middle) StarCraftCIFAR10 (32 x 32) is easy to interpret where blue is player 1, red is player 2, and green are neutral units which are usually just resources. (Right) MNIST (28 x 28) grayscale images are further simplified to show player 1 as white to white-gray, player 2 as black to black-gray, and neutral as shades of gray.
  • Figure 5: Three example noise corruption models which are simulated on top of the StarCraftCIFAR10 dataset, where (left) simulated random additive noise, (middle) simulates observations via a heterogeneous SN, and (right) simulates limited precision (blurry) observations.
  • ...and 10 more figures