Table of Contents
Fetching ...

Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula

Brennen Hill

TL;DR

This work tackles the challenge of scaling environmental complexity for generalizable agents by introducing a co-evolving adversarial framework where a generative Attacker produces dynamically challenging worlds for a cooperative Defender team. By treating environment generation and policy learning as a nearly zero-sum game on a partially observable Markov game, the approach yields an open-ended curriculum that drives rapid emergence of sophisticated tactics on both sides. Quantitative and qualitative analyses show high-frequency emergent strategies (e.g., Attacker Tandem and Flanking; Defender Spreading and Focusing) and demonstrate that co-evolution outperforms static baselines, suggesting strong potential for improving robustness and generalization in MARL. The work highlights the practical impact of automatic curricula and self-scaling environments for advancing resilient, strategic multi-agent systems.

Abstract

The advancement of general-purpose intelligent agents is intrinsically linked to the environments in which they are trained. While scaling models and datasets has yielded remarkable capabilities, scaling the complexity, diversity, and interactivity of environments remains a crucial bottleneck. Hand-crafted environments are finite and often contain implicit biases, limiting the potential for agents to develop truly generalizable and robust skills. In this work, we propose a paradigm for generating a boundless and adaptive curriculum of challenges by framing the environment generation process as an adversarial game. We introduce a system where a team of cooperative multi-agent defenders learns to survive against a procedurally generative attacker. The attacker agent learns to produce increasingly challenging configurations of enemy units, dynamically creating novel worlds tailored to exploit the defenders' current weaknesses. Concurrently, the defender team learns cooperative strategies to overcome these generated threats. This co-evolutionary dynamic creates a self-scaling environment where complexity arises organically from the adversarial interaction, providing an effectively infinite stream of novel and relevant training data. We demonstrate that with minimal training, this approach leads to the emergence of complex, intelligent behaviors, such as flanking and shielding by the attacker, and focus-fire and spreading by the defenders. Our findings suggest that adversarial co-evolution is a powerful mechanism for automatically scaling environmental complexity, driving agents towards greater robustness and strategic depth.

Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula

TL;DR

This work tackles the challenge of scaling environmental complexity for generalizable agents by introducing a co-evolving adversarial framework where a generative Attacker produces dynamically challenging worlds for a cooperative Defender team. By treating environment generation and policy learning as a nearly zero-sum game on a partially observable Markov game, the approach yields an open-ended curriculum that drives rapid emergence of sophisticated tactics on both sides. Quantitative and qualitative analyses show high-frequency emergent strategies (e.g., Attacker Tandem and Flanking; Defender Spreading and Focusing) and demonstrate that co-evolution outperforms static baselines, suggesting strong potential for improving robustness and generalization in MARL. The work highlights the practical impact of automatic curricula and self-scaling environments for advancing resilient, strategic multi-agent systems.

Abstract

The advancement of general-purpose intelligent agents is intrinsically linked to the environments in which they are trained. While scaling models and datasets has yielded remarkable capabilities, scaling the complexity, diversity, and interactivity of environments remains a crucial bottleneck. Hand-crafted environments are finite and often contain implicit biases, limiting the potential for agents to develop truly generalizable and robust skills. In this work, we propose a paradigm for generating a boundless and adaptive curriculum of challenges by framing the environment generation process as an adversarial game. We introduce a system where a team of cooperative multi-agent defenders learns to survive against a procedurally generative attacker. The attacker agent learns to produce increasingly challenging configurations of enemy units, dynamically creating novel worlds tailored to exploit the defenders' current weaknesses. Concurrently, the defender team learns cooperative strategies to overcome these generated threats. This co-evolutionary dynamic creates a self-scaling environment where complexity arises organically from the adversarial interaction, providing an effectively infinite stream of novel and relevant training data. We demonstrate that with minimal training, this approach leads to the emergence of complex, intelligent behaviors, such as flanking and shielding by the attacker, and focus-fire and spreading by the defenders. Our findings suggest that adversarial co-evolution is a powerful mechanism for automatically scaling environmental complexity, driving agents towards greater robustness and strategic depth.

Paper Structure

This paper contains 37 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The game environment. The four Defender agents (1), (2), (3), and (4) can only move horizontally. The arbitrary number of orange Units (5), (6), (7), and (8) generated by the Attacker move vertically from the top of the board downwards.
  • Figure 2: An illustration of the average episode length (Defender survival time in ticks) over training Episode. The plot represents the general upward trend indicating skill improvement, alongside the oscillations suggesting an ongoing arms race where the Attacker discovers new strategies. This curve is representative of the observed dynamic rather than a plot of raw data from a single training run.
  • Figure 3: Examples of emergent adversarial strategies from the generative Attacker agent.
  • Figure 4: Examples of emergent cooperative strategies from the Defender team.