An Imitative Reinforcement Learning Framework for Pursuit-Lock-Launch Missions
Siyuan Li, Rongchang Zuo, Bofei Liu, Yaoyu He, Peng Liu, Yingnan Zhao
TL;DR
The paper tackles autonomous UCAV pursuit-lock-launch policy learning in within-visual-range engagements by introducing an imitative reinforcement learning framework that fuses expert demonstrations with autonomous exploration in a high-fidelity Harfang3D simulator. It uses an actor–critic architecture with double Q networks and a dual-loss objective that blends RL learning with imitation via a tunable weighting strategy, enabling efficient and robust acquisition of multistage policies. Empirical results show the approach achieves up to 100% hit success and consistently outperforms state-of-the-art RL and imitation methods across diverse opponent maneuvers, while requiring comparatively less expert data. A key limitation noted is the dependence on expert data quality, with future work proposed around hierarchical policy designs to reduce reliance on expert demonstrations.
Abstract
Unmanned Combat Aerial Vehicle (UCAV) Within-Visual-Range (WVR) engagement, referring to a fight between two or more UCAVs at close quarters, plays a decisive role on the aerial battlefields. With the development of artificial intelligence, WVR engagement progressively advances towards intelligent and autonomous modes. However, autonomous WVR engagement policy learning is hindered by challenges such as weak exploration capabilities, low learning efficiency, and unrealistic simulated environments. To overcome these challenges, we propose a novel imitative reinforcement learning framework, which efficiently leverages expert data while enabling autonomous exploration. The proposed framework not only enhances learning efficiency through expert imitation, but also ensures adaptability to dynamic environments via autonomous exploration with reinforcement learning. Therefore, the proposed framework can learn a successful policy of `pursuit-lock-launch' for UCAVs. To support data-driven learning, we establish an environment based on the Harfang3D sandbox. The extensive experiment results indicate that the proposed framework excels in this multistage task, and significantly outperforms state-of-the-art reinforcement learning and imitation learning methods. Thanks to the ability of imitating experts and autonomous exploration, our framework can quickly learn the critical knowledge in complex aerial combat tasks, achieving up to a 100% success rate and demonstrating excellent robustness.
