Table of Contents
Fetching ...

SC-Phi2: A Fine-tuned Small Language Model for StarCraft II Macromanagement Tasks

Muhammad Junaid Khan, Gita Sukthankar

TL;DR

This paper introduces SC-Phi2, a fine-tuned StarCraft II small language model for macromanagement tasks, and demonstrates that the model performs well at micromanagement tasks such as build order and global state prediction with a small number of parameters.

Abstract

This paper introduces SC-Phi2, a fine-tuned StarCraft II small language model for macromanagement tasks. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less power and memory to run. To teach Microsoft's Phi2 model about StarCraft, we create a new SC2 text dataset with information about StarCraft races, roles, and actions and use it to fine-tune Phi-2 with self-supervised learning. We pair this language model with a Vision Transformer (ViT) from the pre-trained BLIP-2 (Bootstrapping Language Image Pre-training) model, fine-tuning it on the MSC replay dataset. This enables us to construct dynamic prompts that include visual game state information. Unlike the large models used in StarCraft LLMs such as GPT-3.5, Phi2 is trained primarily on textbook data and contains little inherent knowledge of StarCraft II beyond what is provided by our training process. By using LoRA (Low-rank Adaptation) and quantization, our model can be trained on a single GPU. We demonstrate that our model performs well at micromanagement tasks such as build order and global state prediction with a small number of parameters.

SC-Phi2: A Fine-tuned Small Language Model for StarCraft II Macromanagement Tasks

TL;DR

This paper introduces SC-Phi2, a fine-tuned StarCraft II small language model for macromanagement tasks, and demonstrates that the model performs well at micromanagement tasks such as build order and global state prediction with a small number of parameters.

Abstract

This paper introduces SC-Phi2, a fine-tuned StarCraft II small language model for macromanagement tasks. Small language models, like Phi2, Gemma, and DistilBERT, are streamlined versions of large language models (LLMs) with fewer parameters that require less power and memory to run. To teach Microsoft's Phi2 model about StarCraft, we create a new SC2 text dataset with information about StarCraft races, roles, and actions and use it to fine-tune Phi-2 with self-supervised learning. We pair this language model with a Vision Transformer (ViT) from the pre-trained BLIP-2 (Bootstrapping Language Image Pre-training) model, fine-tuning it on the MSC replay dataset. This enables us to construct dynamic prompts that include visual game state information. Unlike the large models used in StarCraft LLMs such as GPT-3.5, Phi2 is trained primarily on textbook data and contains little inherent knowledge of StarCraft II beyond what is provided by our training process. By using LoRA (Low-rank Adaptation) and quantization, our model can be trained on a single GPU. We demonstrate that our model performs well at micromanagement tasks such as build order and global state prediction with a small number of parameters.
Paper Structure (21 sections, 2 equations, 3 figures, 9 tables)

This paper contains 21 sections, 2 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: SC-Phi2 Model. Spatial features represent screen and mini-map features while global features represent supplies and scores. During the training, we construct a dynamic prompt from both the global features and the textual descriptions generated by the pre-trained Vision Encoder, ViT, from the BLIP-2 vision-language model. Here, we use fine-tuned Phi-2 from stage 1 of fine-tuning, again fine-tuning about $4\%$ of parameters using the LoRA approach.
  • Figure 2: LoRA Adaptation for Language Backbone. (a) shows the general LoRA process. (b) LoRA applied to specific layers in our approach. In the diagram, the red blocks represent the weights updated during training, while the blue blocks denote the frozen weights. A and B are low rank matrices and r is a LoRA hyper-parameter.
  • Figure 3: Prompt used during the stage-2 fine-tuning. Numerical values have been changed to categorical values during training. For example, the value of feature Game Stage is Early, and Army Count is low in the prompt. Similarly all other values have been changed.