A Competition Winning Deep Reinforcement Learning Agent in microRTS

Scott Goodfriend

A Competition Winning Deep Reinforcement Learning Agent in microRTS

Scott Goodfriend

TL;DR

RAISocketAI is the first deep reinforcement learning agent to win the IEEE microRTS competition, demonstrating that map-specific fine-tuning and transfer learning enable competitive DRL agents in a resource-constrained RTS setting. The approach combines dual backbones (DoubleCone and squnet), a three-headed value function, and a training curriculum that shifts from shaped to sparse rewards, augmented by behavior cloning bootstraps for efficiency. Transfer learning to map-specific policies and imitation-learning-based bootstraps reduced training time while achieving high win rates on multiple Open maps and strong performance in competition settings. The work highlights practical pathways to deploy DRL in academic RTS research, emphasizing inference-time improvements, curriculum design, and scalable training strategies for future competitions and studies.

Abstract

Scripted agents have predominantly won the five previous iterations of the IEEE microRTS ($μ$RTS) competitions hosted at CIG and CoG. Despite Deep Reinforcement Learning (DRL) algorithms making significant strides in real-time strategy (RTS) games, their adoption in this primarily academic competition has been limited due to the considerable training resources required and the complexity inherent in creating and debugging such agents. RAISocketAI is the first DRL agent to win the IEEE microRTS competition. In a benchmark without performance constraints, RAISocketAI regularly defeated the two prior competition winners. This first competition-winning DRL submission can be a benchmark for future microRTS competitions and a starting point for future DRL research. Iteratively fine-tuning the base policy and transfer learning to specific maps were critical to RAISocketAI's winning performance. These strategies can be used to economically train future DRL agents. Further work in Imitation Learning using Behavior Cloning and fine-tuning these models with DRL has proven promising as an efficient way to bootstrap models with demonstrated, competitive behaviors.

A Competition Winning Deep Reinforcement Learning Agent in microRTS

TL;DR

Abstract

Scripted agents have predominantly won the five previous iterations of the IEEE microRTS (

RTS) competitions hosted at CIG and CoG. Despite Deep Reinforcement Learning (DRL) algorithms making significant strides in real-time strategy (RTS) games, their adoption in this primarily academic competition has been limited due to the considerable training resources required and the complexity inherent in creating and debugging such agents. RAISocketAI is the first DRL agent to win the IEEE microRTS competition. In a benchmark without performance constraints, RAISocketAI regularly defeated the two prior competition winners. This first competition-winning DRL submission can be a benchmark for future microRTS competitions and a starting point for future DRL research. Iteratively fine-tuning the base policy and transfer learning to specific maps were critical to RAISocketAI's winning performance. These strategies can be used to economically train future DRL agents. Further work in Imitation Learning using Behavior Cloning and fine-tuning these models with DRL has proven promising as an efficient way to bootstrap models with demonstrated, competitive behaviors.

Paper Structure (31 sections, 2 equations, 6 figures, 27 tables)

This paper contains 31 sections, 2 equations, 6 figures, 27 tables.

Introduction
Related Work
MicroRTS-Py
DeepMind's AlphaStar
Lux AI Kaggle Competitions
Methods
Neural Network Architecture
Base Model Training
Transfer Learning
Squnet Training
Behavior Cloning Bootstrapped Training
Results
Single Player Round-robin Benchmark
IEEE-CoG 2023 microRTS Competition Results
Behavior Cloning Results
...and 16 more sections

Figures (6)

Figure 1: Open competition maps.
Figure 2: DoubleCone architecture.
Figure 3: DoubleCone(4, 6, 4) neural network architecture.
Figure 4: ResBlock used in DoubleCone, squnet32, and squnet64. The residual block is similar to a standard residual block but inserts a Squeeze-Excitation block after the convolutional layers and before the residual connection.
Figure 5: Value heads used in (from left to right) DoubleCone, squnet32, and squnet64. The AdaptiveAvgPool2d layer allows the network to be used on various map sizes.
...and 1 more figures

A Competition Winning Deep Reinforcement Learning Agent in microRTS

TL;DR

Abstract

A Competition Winning Deep Reinforcement Learning Agent in microRTS

Authors

TL;DR

Abstract

Table of Contents

Figures (6)