Table of Contents
Fetching ...

Improving Conditional Level Generation using Automated Validation in Match-3 Games

Monica Villanueva Aylagas, Joakim Bergdahl, Jonas Gillberg, Alessandro Sestini, Theodor Tolstoy, Linus Gisslén

TL;DR

This paper tackles the challenge of unreliable validity and limited user control in PCGML-based level generation by introducing Avalon, a framework that leverages automated gameplay validation during training to condition a generator on difficulty statistics. Implemented as a conditional variational autoencoder, Avalon conditions the generation on factors such as the median number of moves to solve, board size, and symmetry, and employs a partial-generation masking strategy to enforce structural constraints. Empirical results in a simplified match-3 setting show that difficulty conditioning improves playability, increasing valid levels from 43.75% to 51.39%, while incurring modest declines in size, diversity, and tile-distribution fidelity. The approach demonstrates practical potential for producing valid, stylized levels with controllable difficulty, and points to future work in richer validation signals, multi-layer level representations, and broader game genres.

Abstract

Generative models for level generation have shown great potential in game production. However, they often provide limited control over the generation, and the validity of the generated levels is unreliable. Despite this fact, only a few approaches that learn from existing data provide the users with ways of controlling the generation, simultaneously addressing the generation of unsolvable levels. %One of the main challenges it faces is that levels generated through automation may not be solvable thus requiring validation. are not always engaging, challenging, or even solvable. This paper proposes Avalon, a novel method to improve models that learn from existing level designs using difficulty statistics extracted from gameplay. In particular, we use a conditional variational autoencoder to generate layouts for match-3 levels, conditioning the model on pre-collected statistics such as game mechanics like difficulty and relevant visual features like size and symmetry. Our method is general enough that multiple approaches could potentially be used to generate these statistics. We quantitatively evaluate our approach by comparing it to an ablated model without difficulty conditioning. Additionally, we analyze both quantitatively and qualitatively whether the style of the dataset is preserved in the generated levels. Our approach generates more valid levels than the same method without difficulty conditioning.

Improving Conditional Level Generation using Automated Validation in Match-3 Games

TL;DR

This paper tackles the challenge of unreliable validity and limited user control in PCGML-based level generation by introducing Avalon, a framework that leverages automated gameplay validation during training to condition a generator on difficulty statistics. Implemented as a conditional variational autoencoder, Avalon conditions the generation on factors such as the median number of moves to solve, board size, and symmetry, and employs a partial-generation masking strategy to enforce structural constraints. Empirical results in a simplified match-3 setting show that difficulty conditioning improves playability, increasing valid levels from 43.75% to 51.39%, while incurring modest declines in size, diversity, and tile-distribution fidelity. The approach demonstrates practical potential for producing valid, stylized levels with controllable difficulty, and points to future work in richer validation signals, multi-layer level representations, and broader game genres.

Abstract

Generative models for level generation have shown great potential in game production. However, they often provide limited control over the generation, and the validity of the generated levels is unreliable. Despite this fact, only a few approaches that learn from existing data provide the users with ways of controlling the generation, simultaneously addressing the generation of unsolvable levels. %One of the main challenges it faces is that levels generated through automation may not be solvable thus requiring validation. are not always engaging, challenging, or even solvable. This paper proposes Avalon, a novel method to improve models that learn from existing level designs using difficulty statistics extracted from gameplay. In particular, we use a conditional variational autoencoder to generate layouts for match-3 levels, conditioning the model on pre-collected statistics such as game mechanics like difficulty and relevant visual features like size and symmetry. Our method is general enough that multiple approaches could potentially be used to generate these statistics. We quantitatively evaluate our approach by comparing it to an ablated model without difficulty conditioning. Additionally, we analyze both quantitatively and qualitatively whether the style of the dataset is preserved in the generated levels. Our approach generates more valid levels than the same method without difficulty conditioning.
Paper Structure (23 sections, 1 equation, 5 figures, 2 tables)

This paper contains 23 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Avalon approach. Before training, -- gray dotted lines in the figure -- the dataset is constructed with levels created by level designers. Offline, we extract a set of features for each level: the median number of moves required to solve it, as well as the board size, type of symmetry. For each level, we use these features as conditioners for the model during training. These features can be extracted using scripted bots or other methods (eg. RL agents or game testers). For our experiments, we use the former approach and train a conditional variational autoencoder. During inference, -- red continuous lines in the figure -- the designer controls the generation through the conditional features, and can manually edit the output level or generate a new one.
  • Figure 2: Example application of a vertical symmetry mask. The mask is first extracted from a level in the dataset, and then it is applied to the same level following $L \circ M^{\text{sym}}$. The masked level is part of the training dataset.
  • Figure 3: Image representation of the levels. The black pixels represent GAP cells, the red pixels depict BLOCK cells and the dark red pixels correspond to PLAYFIELD cells. (a) Training examples from the main dataset (b) Training examples from the stylized dataset (c) Inference examples from the Vanilla generator. (d) Inference examples from the VanillaStylized generator. (e) Inference examples from the Avalon generator. (f) Inference examples from the AvalonStylized generator. The text in subplots (c) through (f) indicates the conditioners used: for sublopts (c) and (d), the text indicates the size (e.g. 5x6) and the symmetry (e.g. Vertical) used as input to Vanilla and VanillaStylized, while subplots (e) and (f) include also the number of moves (e.g. 20) used as input to Avalon and AvalonStylized.
  • Figure 4: Analysis of the main dataset's training set. The heatmap represents the number of levels for each size (x-axis) and each median number of moves for the level to be solved (y-axis), according to the statistics extracted by our bot. The horizontal line indicates the threshold that separates valid from invalid levels.
  • Figure 5: Additional Avalon generated levels. We show examples using all types of symmetries (columns), $3$ levels of difficulty ($2$ rows per difficulty) and $8$ different sizes (repeated every $2$ rows).