Table of Contents
Fetching ...

Reinforcement Learning for High-Level Strategic Control in Tower Defense Games

Joakim Bergdahl, Alessandro Sestini, Linus Gisslén

TL;DR

This work tackles automated playtesting and difficulty validation for tower-defense style mobile games, focusing on Plants vs. Zombies. It introduces a hybrid RL approach (HRL) that learns high-level strategy selection while delegating low-level actions to an existing heuristic AI, enabling scalable yet adaptable testing. Empirical results show HRL outperforms purely heuristic or random baselines on 40 PvZ levels in terms of success rate and cumulative reward, though level-specific puzzles limit generalization across unseen levels. The findings suggest HRL can generate actionable level-testing data and validate difficulty at scale, with future work aimed at improving cross-level generalization and extending the approach to other genres such as real-time strategy games.

Abstract

In strategy games, one of the most important aspects of game design is maintaining a sense of challenge for players. Many mobile titles feature quick gameplay loops that allow players to progress steadily, requiring an abundance of levels and puzzles to prevent them from reaching the end too quickly. As with any content creation, testing and validation are essential to ensure engaging gameplay mechanics, enjoyable game assets, and playable levels. In this paper, we propose an automated approach that can be leveraged for gameplay testing and validation that combines traditional scripted methods with reinforcement learning, reaping the benefits of both approaches while adapting to new situations similarly to how a human player would. We test our solution on a popular tower defense game, Plants vs. Zombies. The results show that combining a learned approach, such as reinforcement learning, with a scripted AI produces a higher-performing and more robust agent than using only heuristic AI, achieving a 57.12% success rate compared to 47.95% in a set of 40 levels. Moreover, the results demonstrate the difficulty of training a general agent for this type of puzzle-like game.

Reinforcement Learning for High-Level Strategic Control in Tower Defense Games

TL;DR

This work tackles automated playtesting and difficulty validation for tower-defense style mobile games, focusing on Plants vs. Zombies. It introduces a hybrid RL approach (HRL) that learns high-level strategy selection while delegating low-level actions to an existing heuristic AI, enabling scalable yet adaptable testing. Empirical results show HRL outperforms purely heuristic or random baselines on 40 PvZ levels in terms of success rate and cumulative reward, though level-specific puzzles limit generalization across unseen levels. The findings suggest HRL can generate actionable level-testing data and validate difficulty at scale, with future work aimed at improving cross-level generalization and extending the approach to other genres such as real-time strategy games.

Abstract

In strategy games, one of the most important aspects of game design is maintaining a sense of challenge for players. Many mobile titles feature quick gameplay loops that allow players to progress steadily, requiring an abundance of levels and puzzles to prevent them from reaching the end too quickly. As with any content creation, testing and validation are essential to ensure engaging gameplay mechanics, enjoyable game assets, and playable levels. In this paper, we propose an automated approach that can be leveraged for gameplay testing and validation that combines traditional scripted methods with reinforcement learning, reaping the benefits of both approaches while adapting to new situations similarly to how a human player would. We test our solution on a popular tower defense game, Plants vs. Zombies. The results show that combining a learned approach, such as reinforcement learning, with a scripted AI produces a higher-performing and more robust agent than using only heuristic AI, achieving a 57.12% success rate compared to 47.95% in a set of 40 levels. Moreover, the results demonstrate the difficulty of training a general agent for this type of puzzle-like game.
Paper Structure (20 sections, 4 figures, 2 tables)

This paper contains 20 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Screenshot of Plants vs. Zombies 2 gameplay. Enemy zombies attack the player from the right, who has to defend their home base line on the leftmost side of the playarea. If an enemy crosses this line, the player loses. The player is free to place units in any available, unoccupied cell of the game board from the loadout visible on the left. When a unit is greyed out, it is locked behind a cool-down timer. Each unit's sun token cost is displayed in the units lower right corner.
  • Figure 2: Training progression of the HRL agent (orange) over four levels, compared to the mean success rate of HAI (blue) and a random agent (green) over 100 episodes in each level respectively. In the displayed levels, the HRL agent learns to outperform HAI in approximately 400 to 1500 episodes. Both the HRL agent and HAI consistently outperform the random agent. The gathered statistics are averaged over $5$ different seeds.
  • Figure 3: Action distribution comparison between HAI, random and HRL agents over four levels representing the average, normalized occurrence of each action over 100 episodes repeated for $5$ different seeds. As evident from the plots, the HRL agent utilizes the actions differently than the other baselines from level to level, meaning the agent has learned to better leverage actions based on the dynamics of the level.
  • Figure 4: Performance of HRL agent (orange) compared to HAI (blue) and a random agent (green) with increasing difficulty in 4 levels. Five difficulties were used, specifically [0, 50K, 100K, 150K, 200K]. More details about the difficulty can be found in Section \ref{['sec:levels_and_difficulty']}. The general performance of the agent indicates the it successfully completes more levels than the baselines for each difficulty. We collect statistics for the experiments for 100 levels and 5 different seeds. Note, in the experiments, the agents were trained with difficulty of 100K, the mid-point of the aforementioned difficulty set.