Table of Contents
Fetching ...

Non-Equilibrium MAV-Capture-MAV via Time-Optimal Planning and Reinforcement Learning

Canlun Zheng, Zhanyu Guo, Zikang Yin, Chunyu Wang, Zhikun Wang, Shiyu Zhao

TL;DR

This paper addresses agile MAV-to-MAV capture by formulating a high-dimensional, non-equilibrium control problem and evaluating two distinct strategies: offline Time-Optimal Planning (TOP) and online Reinforcement Learning (RL) using Proximal Policy Optimization. TOP generates brief, highly maneuverable trajectories at the expense of computational load, while RL provides real-time adaptability and stability, evidenced by smoother flight and robust interception in simulation and limited real-world tests. A compact capture MAV with a dedicated rubber-ball launcher is developed to enable real-time RL on-board, and a detailed ball-trajectory and MAV dynamics model supports both approaches. The study highlights the complementary strengths of TOP and RL for agile capture and demonstrates practical viability through simulation and indoor RL experiments, with implications for counter-MAV applications and future sensor-perception integration.

Abstract

The capture of flying MAVs (micro aerial vehicles) has garnered increasing research attention due to its intriguing challenges and promising applications. Despite recent advancements, a key limitation of existing work is that capture strategies are often relatively simple and constrained by platform performance. This paper addresses control strategies capable of capturing high-maneuverability targets. The unique challenge of achieving target capture under unstable conditions distinguishes this task from traditional pursuit-evasion and guidance problems. In this study, we transition from larger MAV platforms to a specially designed, compact capture MAV equipped with a custom launching device while maintaining high maneuverability. We explore both time-optimal planning (TOP) and reinforcement learning (RL) methods. Simulations demonstrate that TOP offers highly maneuverable and shorter trajectories, while RL excels in real-time adaptability and stability. Moreover, the RL method has been tested in real-world scenarios, successfully achieving target capture even in unstable states.

Non-Equilibrium MAV-Capture-MAV via Time-Optimal Planning and Reinforcement Learning

TL;DR

This paper addresses agile MAV-to-MAV capture by formulating a high-dimensional, non-equilibrium control problem and evaluating two distinct strategies: offline Time-Optimal Planning (TOP) and online Reinforcement Learning (RL) using Proximal Policy Optimization. TOP generates brief, highly maneuverable trajectories at the expense of computational load, while RL provides real-time adaptability and stability, evidenced by smoother flight and robust interception in simulation and limited real-world tests. A compact capture MAV with a dedicated rubber-ball launcher is developed to enable real-time RL on-board, and a detailed ball-trajectory and MAV dynamics model supports both approaches. The study highlights the complementary strengths of TOP and RL for agile capture and demonstrates practical viability through simulation and indoor RL experiments, with implications for counter-MAV applications and future sensor-perception integration.

Abstract

The capture of flying MAVs (micro aerial vehicles) has garnered increasing research attention due to its intriguing challenges and promising applications. Despite recent advancements, a key limitation of existing work is that capture strategies are often relatively simple and constrained by platform performance. This paper addresses control strategies capable of capturing high-maneuverability targets. The unique challenge of achieving target capture under unstable conditions distinguishes this task from traditional pursuit-evasion and guidance problems. In this study, we transition from larger MAV platforms to a specially designed, compact capture MAV equipped with a custom launching device while maintaining high maneuverability. We explore both time-optimal planning (TOP) and reinforcement learning (RL) methods. Simulations demonstrate that TOP offers highly maneuverable and shorter trajectories, while RL excels in real-time adaptability and stability. Moreover, the RL method has been tested in real-world scenarios, successfully achieving target capture even in unstable states.

Paper Structure

This paper contains 14 sections, 14 equations, 6 figures.

Figures (6)

  • Figure 1: The capture MAV can launch a rubber ball in a non-equilibrium attitude to hit the target MAV.
  • Figure 2: The capture MAV platform. (a) shows the capture MAV. (b) shows the mechanical structure of the launch device, the dynamics of the flying ball, and the components of the launching velocity. (c) shows the capture MAV launching a rubber ball in a real flight experiment.
  • Figure 3: (a) is the control policy, including RL and TOP. (b) is the training simulation environment developed in Isaac Gym.
  • Figure 4: Scenario 1 simulation results. (a) shows the capture MAV launching a static target from three different initial positions: above, below, and behind the target. (b) shows the distance between the capture MAV and the target, the minimum distance between the predicted ball hitting position and the target, and the distance between the launching ball and the target by two methods, respectively. (c) and (d) show the capture of MAV's speeds and attitudes, respectively. In (b), (c), and (d), $\circ$ marks the launching time of the capture MAV.
  • Figure 5: Scenario 2 simulation results. (a) shows the capture MAV launching a target with 2 $\rm m/s$ velocity from four different initial positions. (b) shows the capture MAV launching a target with a norm velocity of 4 $\rm m/s$ in a circular motion from four different initial positions. (c) shows the launching state statistics of the capture MAV, including the relative speeds, relative distances, flying time, $\theta$, $\phi$, miss-distances (the closest distance between the flying ball's trajectory and the target's position), and angular velocity. (d) shows the relationships between the relative speed, relative distance, and ball flying time with the miss-distances, respectively.
  • ...and 1 more figures