Non-Equilibrium MAV-Capture-MAV via Time-Optimal Planning and Reinforcement Learning
Canlun Zheng, Zhanyu Guo, Zikang Yin, Chunyu Wang, Zhikun Wang, Shiyu Zhao
TL;DR
This paper addresses agile MAV-to-MAV capture by formulating a high-dimensional, non-equilibrium control problem and evaluating two distinct strategies: offline Time-Optimal Planning (TOP) and online Reinforcement Learning (RL) using Proximal Policy Optimization. TOP generates brief, highly maneuverable trajectories at the expense of computational load, while RL provides real-time adaptability and stability, evidenced by smoother flight and robust interception in simulation and limited real-world tests. A compact capture MAV with a dedicated rubber-ball launcher is developed to enable real-time RL on-board, and a detailed ball-trajectory and MAV dynamics model supports both approaches. The study highlights the complementary strengths of TOP and RL for agile capture and demonstrates practical viability through simulation and indoor RL experiments, with implications for counter-MAV applications and future sensor-perception integration.
Abstract
The capture of flying MAVs (micro aerial vehicles) has garnered increasing research attention due to its intriguing challenges and promising applications. Despite recent advancements, a key limitation of existing work is that capture strategies are often relatively simple and constrained by platform performance. This paper addresses control strategies capable of capturing high-maneuverability targets. The unique challenge of achieving target capture under unstable conditions distinguishes this task from traditional pursuit-evasion and guidance problems. In this study, we transition from larger MAV platforms to a specially designed, compact capture MAV equipped with a custom launching device while maintaining high maneuverability. We explore both time-optimal planning (TOP) and reinforcement learning (RL) methods. Simulations demonstrate that TOP offers highly maneuverable and shorter trajectories, while RL excels in real-time adaptability and stability. Moreover, the RL method has been tested in real-world scenarios, successfully achieving target capture even in unstable states.
