Table of Contents
Fetching ...

Interpretable DRL-based Maneuver Decision of UCAV Dogfight

Haoran Han, Jian Cheng, Maolong Lv

TL;DR

This work tackles interpretability in DRL-driven UCAV dogfights under realistic 6-DOF dynamics by proposing a three-layer frame that separates high-level maneuver decision (DDQN) from low-level actuation (four-channel PID) and a library of eight BFMs. The DDQN is trained against a DT opponent, achieving $85.75\%$ win-rate against DT and demonstrating interpretable strategies such as yo-yo adjustments and an emergent Dive and Chase tactic, with post-hoc analysis of agent behavior. Key contributions include the three-layer frame, the eight-BFM library, and an open gym environment, enabling transparent evaluation of DRL policies in complex, nonlinear aerial combat. The results indicate DRL can yield superior maneuverability and discover novel tactics while maintaining interpretability, which is crucial for safety-conscious autonomous combat systems.

Abstract

This paper proposes a three-layer unmanned combat aerial vehicle (UCAV) dogfight frame where Deep reinforcement learning (DRL) is responsible for high-level maneuver decision. A four-channel low-level control law is firstly constructed, followed by a library containing eight basic flight maneuvers (BFMs). Double deep Q network (DDQN) is applied for BFM selection in UCAV dogfight, where the opponent strategy during the training process is constructed with DT. Our simulation result shows that, the agent can achieve a win rate of 85.75% against the DT strategy, and positive results when facing various unseen opponents. Based on the proposed frame, interpretability of the DRL-based dogfight is significantly improved. The agent performs yo-yo to adjust its turn rate and gain higher maneuverability. Emergence of "Dive and Chase" behavior also indicates the agent can generate a novel tactic that utilizes the drawback of its opponent.

Interpretable DRL-based Maneuver Decision of UCAV Dogfight

TL;DR

This work tackles interpretability in DRL-driven UCAV dogfights under realistic 6-DOF dynamics by proposing a three-layer frame that separates high-level maneuver decision (DDQN) from low-level actuation (four-channel PID) and a library of eight BFMs. The DDQN is trained against a DT opponent, achieving win-rate against DT and demonstrating interpretable strategies such as yo-yo adjustments and an emergent Dive and Chase tactic, with post-hoc analysis of agent behavior. Key contributions include the three-layer frame, the eight-BFM library, and an open gym environment, enabling transparent evaluation of DRL policies in complex, nonlinear aerial combat. The results indicate DRL can yield superior maneuverability and discover novel tactics while maintaining interpretability, which is crucial for safety-conscious autonomous combat systems.

Abstract

This paper proposes a three-layer unmanned combat aerial vehicle (UCAV) dogfight frame where Deep reinforcement learning (DRL) is responsible for high-level maneuver decision. A four-channel low-level control law is firstly constructed, followed by a library containing eight basic flight maneuvers (BFMs). Double deep Q network (DDQN) is applied for BFM selection in UCAV dogfight, where the opponent strategy during the training process is constructed with DT. Our simulation result shows that, the agent can achieve a win rate of 85.75% against the DT strategy, and positive results when facing various unseen opponents. Based on the proposed frame, interpretability of the DRL-based dogfight is significantly improved. The agent performs yo-yo to adjust its turn rate and gain higher maneuverability. Emergence of "Dive and Chase" behavior also indicates the agent can generate a novel tactic that utilizes the drawback of its opponent.
Paper Structure (18 sections, 19 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 19 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Geometric relation of UCAVs in the dogfight, where the blue one is controlled by the agent, and the red one is regarded as the opponent.
  • Figure 2: Overview of the proposed UCAV dogfight frame.
  • Figure 3: (a) Accumulated number and (b) rate for win, loss, tie among the training process. The solid lines mean the average calculated over 10 trials, while the shadow ones represent the maximum and the minimum.
  • Figure 4: (a) 3D trajectories, and response of (b) ATA, distance, (c) mach, height of two UCAVs in the first case with double-loop dogfight condition.
  • Figure 5: 3D trajectories during (a) $[0, 110)$, (b) $[110, 140)$, (c) $[140, 169]$ s, and response of (d) ATA, distance, and (c) height in the second case.