Learning Environment for the Air Domain (LEAD)

Andreas Strand; Patrick Gorton; Martin Asprusten; Karsten Brathen

Learning Environment for the Air Domain (LEAD)

Andreas Strand, Patrick Gorton, Martin Asprusten, Karsten Brathen

TL;DR

LEAD introduces a modular, Gymnasium-based learning environment for air-domain CGFs that interworks with third-party simulators through distributed simulation. It combines a fast, replaceable Simulation System (SACS) with an Agent Gateway, Simulation Interpreter, and a Distributed Simulation Service to enable RL and IL for realistic air-combat behavior. The authors demonstrate PPO-based formation-flight learning in LEAD, achieving robust wingman formation under varying lead maneuvers and random initial conditions, with insights into training dynamics and scalability. This work advances practical, scalable workflows for training adaptive CGFs across simulations of different fidelities, potentially improving pilot training with intelligent autonomous adversaries.

Abstract

A substantial part of fighter pilot training is simulation-based and involves computer-generated forces controlled by predefined behavior models. The behavior models are typically manually created by eliciting knowledge from experienced pilots, which is a time-consuming process. Despite the work put in, the behavior models are often unsatisfactory due to their predictable nature and lack of adaptivity, forcing instructors to spend time manually monitoring and controlling them. Reinforcement and imitation learning pose as alternatives to handcrafted models. This paper presents the Learning Environment for the Air Domain (LEAD), a system for creating and integrating intelligent air combat behavior in military simulations. By incorporating the popular programming library and interface Gymnasium, LEAD allows users to apply readily available machine learning algorithms. Additionally, LEAD can communicate with third-party simulation software through distributed simulation protocols, which allows behavior models to be learned and employed using simulation systems of different fidelities.

Learning Environment for the Air Domain (LEAD)

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 6 figures, 2 tables)

This paper contains 15 sections, 1 equation, 6 figures, 2 tables.

INTRODUCTION
LEARNING ENVIRONMENT FOR THE AIR DOMAIN
Simulation System
Agent Gateway
Simulation Interpreter
Distributed Simulation Service
High Level Architecture Connection
Machine Learning Connection
Agent
EXPERIMENT
Configuration of LEAD
Learning Formation Flight with Proximal Policy Optimization
Results
Discussion
CONCLUSION

Figures (6)

Figure 1: An agent interacts with LEAD and learns by reinforcement or imitation. The configuration file provided to LEAD (blue section) defines the simulation, agent gateway and interpreter settings. The agent (green section) includes a selected ML algorithm and agent policy type. Learning may occur once the ML algorithm's hyperparameters are set manually or through an optimizer.
Figure 2: The architecture of SACS. The simulation system consists of a world with a terrain and entities. Each entity has seven subsystems, some of which have multiple instances or implementations. Some of these subsystems depend on other subsystems, depicted by dashed arrows.
Figure 3: The HLA objects and interactions that make up the FOM used in the LEAD federation when using the HLA connection.
Figure 4: Formation flight with a lead aircraft (right) and wingman aircraft (left). North is up. The formation point (black circle) represents where the wingman is instructed to fly, given by a certain distance $d_{\textnormal{l}}$ and aspect angle $\alpha_{\textnormal{l}}$ from the lead aircraft. However, the wingman deviates by a distance $d_{\textnormal{w}}$ with a bearing angle $\alpha_{\textnormal{w}}$. The formation point moves according to the lead aircraft with speed $v_{\textnormal{p}}$ at an angle $\alpha_{\textnormal{p}}$. Aircraft image source: Goldhawk Interactive.
Figure 5: The task of the agent is to control the wingman aircraft, which amounts to achieving and maintaining formation, measured by a Gaussian reward function with $a=$ 5e-7m^-2.
...and 1 more figures

Learning Environment for the Air Domain (LEAD)

TL;DR

Abstract

Learning Environment for the Air Domain (LEAD)

Authors

TL;DR

Abstract

Table of Contents

Figures (6)