Table of Contents
Fetching ...

Experiments with Encoding Structured Data for Neural Networks

Sujay Nagesh Koujalgi, Jonathan Dodge

TL;DR

This work tackles AI decision-making in a sequential, partially observable wargaming domain called Battlespace by examining how to encode complex structured data for neural processing. It compares three encodings of the game state, couples Monte Carlo Tree Search with Random exploration, and trains neural-network agents (via a CNN or dense nets) using signals derived from MCTS rollouts within a DQN framework. Key findings show that encoding choice profoundly affects learning, with layered representations reducing orientation biases and improved generalization, while MCTS-driven training faces severe state-space and computation bottlenecks that bias agents toward safe actions. The study suggests future work toward game-theoretic approaches that bypass explicit search trees and toward richer move-history encoding to better capture strategic dynamics in Battlespace, with implications for designing decision-support and intelligent-opponent systems in complex, multi-agent environments.

Abstract

The project's aim is to create an AI agent capable of selecting good actions in a game-playing domain called Battlespace. Sequential domains like Battlespace are important testbeds for planning problems, as such, the Department of Defense uses such domains for wargaming exercises. The agents we developed combine Monte Carlo Tree Search (MCTS) and Deep Q-Network (DQN) techniques in an effort to navigate the game environment, avoid obstacles, interact with adversaries, and capture the flag. This paper will focus on the encoding techniques we explored to present complex structured data stored in a Python class, a necessary precursor to an agent.

Experiments with Encoding Structured Data for Neural Networks

TL;DR

This work tackles AI decision-making in a sequential, partially observable wargaming domain called Battlespace by examining how to encode complex structured data for neural processing. It compares three encodings of the game state, couples Monte Carlo Tree Search with Random exploration, and trains neural-network agents (via a CNN or dense nets) using signals derived from MCTS rollouts within a DQN framework. Key findings show that encoding choice profoundly affects learning, with layered representations reducing orientation biases and improved generalization, while MCTS-driven training faces severe state-space and computation bottlenecks that bias agents toward safe actions. The study suggests future work toward game-theoretic approaches that bypass explicit search trees and toward richer move-history encoding to better capture strategic dynamics in Battlespace, with implications for designing decision-support and intelligent-opponent systems in complex, multi-agent environments.

Abstract

The project's aim is to create an AI agent capable of selecting good actions in a game-playing domain called Battlespace. Sequential domains like Battlespace are important testbeds for planning problems, as such, the Department of Defense uses such domains for wargaming exercises. The agents we developed combine Monte Carlo Tree Search (MCTS) and Deep Q-Network (DQN) techniques in an effort to navigate the game environment, avoid obstacles, interact with adversaries, and capture the flag. This paper will focus on the encoding techniques we explored to present complex structured data stored in a Python class, a necessary precursor to an agent.
Paper Structure (20 sections, 4 equations, 8 figures, 2 tables)

This paper contains 20 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An example Battlespace board, snapshot from Mid Game: Four players represented by different colors are playing the game. Purple and Red units are in one team, deployed in the northern half of the board. Yellow and Green are in the second team, deployed on the southern half of the board. We can see that players have employed a safe playing strategy by placing their flags at the edge of the board as far away from the enemy territory as possible, with units in front of the flag. In this figure, we see both the air layer and the ground layer superimposed because that is how the built-in UI renders the game.
  • Figure 2: Class hierarchy used in Battlespace. We represented the abstract classes (each having no fill color) via the following shapes: Unit with a triangle, Movable with a rectangle, and Immovable with an oval. The player-deployed Units are textured, meaning those with a solid fill are not deployed by the user during the deployment phase. The player-controlled units have a blue fill color, meaning those with a red fill color are not under user control after the deployment phase.
  • Figure 3: How the AI agent makes a decision, showing nouns in black boxes, verbs on arrows, and the data involved above each step in the pipeline (adapted from dodge2022mutants). The process begins with a board, which gets converted to a board tensor by separating game objects by faction. Here, the green tank is about to move, and we assume the green plane and blue tank are friendly toward the green tank, while the red tank is an enemy, so we encode their type and heading into the tensor. A convolutional neural network then takes the board tensor as input and outputs outcome probabilities, which the agent then scores for each action, resulting in a matrix. After enforcing domain constraints on the score matrix, the agent applies a softmax to the scores. Finally, to select an action, the agent samples from the resulting distribution. Parts in cyan activate only during training. To contrast with typical approaches, most AI architectures for this task will directly map either board$\rightarrow$action (an opaque box) or (board, action)$\rightarrow$value (which must be run many times in order to generate actions/explanations).
  • Figure 4: Histogram showing actions selected by the NN agent after training for 20 epochs with a batch size of 4 and MCTS rollouts set to 500 in 4 different games. Each image shows all moves from the start until the game is over, with the number of rounds played in each game shown in the right-hand top corner. This chart is for the simplified version of the game where the game was played between two players with one land unit per player. Hence, the number of unique actions is 12. For better visual clarity some moves (e.g., 'turn0', 'turn45', 'advance1', 'ram') do not appear in the figure because these actions went unused in all of these games. The actions in red are "non-impactful" or non-offensive while the actions in blue are transitional or offensive actions.
  • Figure 5: A human player makes a move by selecting move options for each playable unit from terminal
  • ...and 3 more figures