Table of Contents
Fetching ...

Carefully Structured Compression: Efficiently Managing StarCraft II Data

Bryce Ferenczi, Rhys Newbury, Michael Burke, Tom Drummond

TL;DR

A serialization framework for StarCraft II is introduced that reduces the cost of dataset creation and storage, as well as improving usage ergonomics and using the dataset to train deep learning models that exceed the performance of comparable models trained on other datasets.

Abstract

Creation and storage of datasets are often overlooked input costs in machine learning, as many datasets are simple image label pairs or plain text. However, datasets with more complex structures, such as those from the real time strategy game StarCraft II, require more deliberate thought and strategy to reduce cost of ownership. We introduce a serialization framework for StarCraft II that reduces the cost of dataset creation and storage, as well as improving usage ergonomics. We benchmark against the most comparable existing dataset from \textit{AlphaStar-Unplugged} and highlight the benefit of our framework in terms of both the cost of creation and storage. We use our dataset to train deep learning models that exceed the performance of comparable models trained on other datasets. The dataset conversion and usage framework introduced is open source and can be used as a framework for datasets with similar characteristics such as digital twin simulations. Pre-converted StarCraft II tournament data is also available online.

Carefully Structured Compression: Efficiently Managing StarCraft II Data

TL;DR

A serialization framework for StarCraft II is introduced that reduces the cost of dataset creation and storage, as well as improving usage ergonomics and using the dataset to train deep learning models that exceed the performance of comparable models trained on other datasets.

Abstract

Creation and storage of datasets are often overlooked input costs in machine learning, as many datasets are simple image label pairs or plain text. However, datasets with more complex structures, such as those from the real time strategy game StarCraft II, require more deliberate thought and strategy to reduce cost of ownership. We introduce a serialization framework for StarCraft II that reduces the cost of dataset creation and storage, as well as improving usage ergonomics. We benchmark against the most comparable existing dataset from \textit{AlphaStar-Unplugged} and highlight the benefit of our framework in terms of both the cost of creation and storage. We use our dataset to train deep learning models that exceed the performance of comparable models trained on other datasets. The dataset conversion and usage framework introduced is open source and can be used as a framework for datasets with similar characteristics such as digital twin simulations. Pre-converted StarCraft II tournament data is also available online.

Paper Structure

This paper contains 19 sections, 1 equation, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Compute resources measured by querying Docker API when serializing 20 replays. AlphaStar-Unplugged took 4217 seconds to complete whereas sc2-serializer took 3828 seconds.
  • Figure 2: Serialization size of 20 randomly selected (biased towards longer duration) replays from StarCraft II. Note, sc2-serializer includes a constant $16$MB index which becomes negligible.
  • Figure 3: Filesize contribution breakdown of 20 randomly selected replays after serialization.
  • Figure 4: Evaluation using the CNN+MLP Global POV model on both in-distribution and out-of-distribution data. Legend is formatted as: Training Set (Testing Set).
  • Figure 5: Forecasted Ego-agent Unit Occupancy on the minimap $3$ sec into the future. True-positive threshold is $P>0.5$ and shown in green, false negative is shown red, and false-positive as blue.
  • ...and 2 more figures