Table of Contents
Fetching ...

Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning

Ardian Selmonaj, Giacomo Del Rio, Adrian Schneider, Alessandro Antonucci

TL;DR

This work presents a hierarchical multi-agent reinforcement learning framework (HMARL) for realistic 3D air combat using JSBSim physics. By structuring decision making into low-level continuous control policies and high-level commander policies, the approach enables coordinated maneuvers while handling partial observability. It couples curriculum learning and league-play to progressively increase task difficulty and robustness, and introduces a multi-agent adaptation of Simple Policy Optimization (MA-SPO) within a centralized training and decentralized execution paradigm. Experiments across varied team sizes show that hierarchical agents achieve superior combat performance and resilience compared with non-hierarchical baselines, including strong performance in large-scale engagements. The framework advances realism and coordination in air combat MARL and provides a foundation for sim-to-real transfer, human–AI collaboration, and further tactical planning research.

Abstract

Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.

Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning

TL;DR

This work presents a hierarchical multi-agent reinforcement learning framework (HMARL) for realistic 3D air combat using JSBSim physics. By structuring decision making into low-level continuous control policies and high-level commander policies, the approach enables coordinated maneuvers while handling partial observability. It couples curriculum learning and league-play to progressively increase task difficulty and robustness, and introduces a multi-agent adaptation of Simple Policy Optimization (MA-SPO) within a centralized training and decentralized execution paradigm. Experiments across varied team sizes show that hierarchical agents achieve superior combat performance and resilience compared with non-hierarchical baselines, including strong performance in large-scale engagements. The framework advances realism and coordination in air combat MARL and provides a foundation for sim-to-real transfer, human–AI collaboration, and further tactical planning research.

Abstract

Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.

Paper Structure

This paper contains 19 sections, 17 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Aircraft postures showing: (a) attacking angles, and (b) cannon fire modeled through the WEZ.
  • Figure 2: Hierarchical simulation process: for each aircraft type, a shared commander policy decides which low-level policy to activate, this in turn producing control maneuvers.
  • Figure 3: Rendered air combat trajectories in (a) our custom environment using PyGame, and (b) VR-Forces when deployed.
  • Figure 4: Training results of all $\pi_c$ on L1 across different tasks.
  • Figure 5: Training of $\pi_h$ through MA-SPO and MA-PPO and $\pi_{\text{\scriptsize fc}}$ with FC-SPO against mixed strategy opponents (Sec. \ref{['sec:curriculum']}).