A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

Jiajun Chai; Wenzhang Chen; Yuanheng Zhu; Zong-xin Yao; Dongbin Zhao

A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

Jiajun Chai, Wenzhang Chen, Yuanheng Zhu, Zong-xin Yao, Dongbin Zhao

TL;DR

This work tackles autonomous, continuous-action 6-DOF air combat by proposing a model-free hierarchical RL framework that splits control into an inner-loop flight controller and an outer-loop combat strategy. Both loops are trained with Proximal Policy Optimization, with a reward design that balances tracking accuracy and control smoothness, and a fictitious self-play mechanism that evolves stronger outer-loop strategies over generations. The results show that the RL-based flight controller outperforms PID in tracking tasks, while the self-play–driven outer-loop strategies achieve higher win rates and more efficient maneuvers, illustrating the approach’s effectiveness for complex, high-dimensional, zero-sum settings. The framework offers practical implications for robust UCAV autonomy and suggests pathways for extending to multi-agent and general-sum scenarios in future work.

Abstract

Unmanned combat air vehicle (UCAV) combat is a challenging scenario with continuous action space. In this paper, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under 6 dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision process into two loops and use reinforcement learning (RL) to solve them separately. The outer loop takes into account the current combat situation and decides the expected macro behavior of the aircraft according to a combat strategy. Then the inner loop tracks the macro behavior with a flight controller by calculating the actual input signals for the aircraft. We design the Markov decision process for both the outer loop strategy and inner loop controller, and train them by proximal policy optimization (PPO) algorithm. For the inner loop controller, we design an effective reward function to accurately track various macro behavior. For the outer loop strategy, we further adopt a fictitious self-play mechanism to improve the combat performance by constantly combating against the historical strategies. Experiment results show that the inner loop controller can achieve better tracking performance than fine-tuned PID controller, and the outer loop strategy can perform complex maneuvers to get higher and higher winning rate, with the generation evolves.

A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

TL;DR

Abstract

Paper Structure (18 sections, 12 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 18 sections, 12 equations, 10 figures, 3 tables, 2 algorithms.

Introduction
Contribution
Organization
Problem Formulation
Air-combat Scenario
Proximal Policy Optimization
RL-Based 6-DOF Flight Controller
Hierarchical Decision for 6-DOF Air Combat
Experiments on RL-based Flight Controller
Experimental Setup
Results
Experiments on Air Combat Strategy
Experimental Setup
Main Results
Strategy Analysis
...and 3 more sections

Figures (10)

Figure 1: Major components of 6-DOF dynamics. seto2000case
Figure 2: Attack range of each aircraft in the combat scenario.
Figure 4: Performance matrix of flight controller. (a) Alive bonus. The controller with higher score has better ability to keep stability. (b) Tracking performance. The controller with higher score can track target signals better.
Figure 5: Learning curves of the episode reward (blue), alive bonus (red), and tracking error (green) of the final flight controller.
Figure 6: Performance of trained PPO-based flight controller and PID-based flight controller. The controllers need to track the given target signals (roll and pitch angle). (a) sine-cosine signal. (b) step signal. (c) random signal.
...and 5 more figures

A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

TL;DR

Abstract

A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

Authors

TL;DR

Abstract

Table of Contents

Figures (10)