Table of Contents
Fetching ...

A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLab

Isaac Peterson, Christopher Allred, Jacob Morrey, Mario Harper

TL;DR

The paper addresses the need for scalable adversarial multi-agent reinforcement learning in high-fidelity robotic simulations with heterogeneous morphologies. It introduces HARL-A, a framework that extends IsaacLab with per-team critics and a HAPPO-based training loop to maintain meaningful value signals in competitive, zero-sum settings, along with a curriculum-learning strategy and zero-buffer observation padding. Through environments like Sumo, Soccer, and 3D Galaga, the authors demonstrate emergent adversarial behaviors, improved win rates, and robust policy learning under both alternating and simultaneous training. The work provides a practical, extensible platform that facilitates robust, morphology-diverse adversarial MARL in embodied robotics, with potential applications in pursuit-evasion, security, and competitive manipulation, and outlines concrete future enhancements for scalability and evaluation.

Abstract

Multi-Agent Reinforcement Learning (MARL) is central to robotic systems cooperating in dynamic environments. While prior work has focused on these collaborative settings, adversarial interactions are equally critical for real-world applications such as pursuit-evasion, security, and competitive manipulation. In this work, we extend the IsaacLab framework to support scalable training of adversarial policies in high-fidelity physics simulations. We introduce a suite of adversarial MARL environments featuring heterogeneous agents with asymmetric goals and capabilities. Our platform integrates a competitive variant of Heterogeneous Agent Reinforcement Learning with Proximal Policy Optimization (HAPPO), enabling efficient training and evaluation under adversarial dynamics. Experiments across several benchmark scenarios demonstrate the framework's ability to model and train robust policies for morphologically diverse multi-agent competition while maintaining high throughput and simulation realism. Code and benchmarks are available at: https://github.com/DIRECTLab/IsaacLab-HARL .

A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLab

TL;DR

The paper addresses the need for scalable adversarial multi-agent reinforcement learning in high-fidelity robotic simulations with heterogeneous morphologies. It introduces HARL-A, a framework that extends IsaacLab with per-team critics and a HAPPO-based training loop to maintain meaningful value signals in competitive, zero-sum settings, along with a curriculum-learning strategy and zero-buffer observation padding. Through environments like Sumo, Soccer, and 3D Galaga, the authors demonstrate emergent adversarial behaviors, improved win rates, and robust policy learning under both alternating and simultaneous training. The work provides a practical, extensible platform that facilitates robust, morphology-diverse adversarial MARL in embodied robotics, with potential applications in pursuit-evasion, security, and competitive manipulation, and outlines concrete future enhancements for scalability and evaluation.

Abstract

Multi-Agent Reinforcement Learning (MARL) is central to robotic systems cooperating in dynamic environments. While prior work has focused on these collaborative settings, adversarial interactions are equally critical for real-world applications such as pursuit-evasion, security, and competitive manipulation. In this work, we extend the IsaacLab framework to support scalable training of adversarial policies in high-fidelity physics simulations. We introduce a suite of adversarial MARL environments featuring heterogeneous agents with asymmetric goals and capabilities. Our platform integrates a competitive variant of Heterogeneous Agent Reinforcement Learning with Proximal Policy Optimization (HAPPO), enabling efficient training and evaluation under adversarial dynamics. Experiments across several benchmark scenarios demonstrate the framework's ability to model and train robust policies for morphologically diverse multi-agent competition while maintaining high throughput and simulation realism. Code and benchmarks are available at: https://github.com/DIRECTLab/IsaacLab-HARL .

Paper Structure

This paper contains 16 sections, 6 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Heterogeneous Agent Reinforcement Learning Adversarial (HARL-A) environments in IsaacLab showcasing adversarial heterogeneous multi-agent settings in competition learning tasks. (Top) Two quadruped teams compete in a Sumo task with two Leatherback Rovers. (Bottom) Mixed-agent teams of humanoids and robots in the HARL-A framwork.
  • Figure 2: Outline of different actor-critic training paradigms in adversarial reinforcement learning. With the addition of this framework, training paradigms B) and D) are now possible.
  • Figure 3: Evolution of the state space for the curriculum learning for the anymal c robot in the sumo adversarial environment. The state vector of the anymal includes velocity, joint positions etc., $\mathbf{x}_{val}$ is the position of $val$ with respect to the robot, $\mathbf{0}$ represent the 0 vectors that hold place for the observations later on, as demonstrated in wang2020few for cirriculum learning, $\mathbf{r}$ represents the radius of the ring, and $\mathbf{d}$ represents the distance of anymal to the center of the ring.
  • Figure 4: Top image, Leatherback possession and bottom goal, Leatherback possession followed by turnover and Anymal goal
  • Figure 5: 3D Galaga: Anti-Aircraft Defense. Green MiniTanks fire arm-aligned laser-tag rays (2 and 3) every control step. A drone (flight path shown in 1) is knocked out when its position comes within a radius of any active ray (4), or if it drops below the minimum flight height.
  • ...and 4 more figures