Table of Contents
Fetching ...

ProxFly: Robust Control for Close Proximity Quadcopter Flight via Residual Reinforcement Learning

Ruiqi Zhang, Dingqi Zhang, Mark W. Mueller

TL;DR

ProxFly addresses the challenge of robust close-proximity quadcopter flight under downwash disturbances by stacking a residual reinforcement learning module on top of a traditional cascaded controller. The residual policy learns to compensate for external disturbances without requiring inter-vehicle communication, using domain randomization to generalize across model variations and guiding the RL with the basic controller to improve data efficiency. The approach demonstrates significant gains in position and attitude accuracy in both simulation and real-world proximity tasks, achieving performance comparable to a complex model-based aerodynamic compensator and enabling reliable aerial docking under extreme proximity. This hybrid control framework reduces reliance on precise aerodynamics modeling, lowers data and communication requirements, and provides a practical path toward robust, multi-vehicle quadcopter operations in constrained environments.

Abstract

This paper proposes the ProxFly, a residual deep Reinforcement Learning (RL)-based controller for close proximity quadcopter flight. Specifically, we design a residual module on top of a cascaded controller (denoted as basic controller) to generate high-level control commands, which compensate for external disturbances and thrust loss caused by downwash effects from other quadcopters. First, our method takes only the ego state and controllers' commands as inputs and does not rely on any communication between quadcopters, thereby reducing the bandwidth requirement. Through domain randomization, our method relaxes the requirement for accurate system identification and fine-tuned controller parameters, allowing it to adapt to changing system models. Meanwhile, our method not only reduces the proportion of unexplainable signals from the black box in control commands but also enables the RL training to skip the time-consuming exploration from scratch via guidance from the basic controller. We validate the effectiveness of the residual module in the simulation with different proximities. Moreover, we conduct the real close proximity flight test to compare ProxFly with the basic controller and an advanced model-based controller with complex aerodynamic compensation. Finally, we show that ProxFly can be used for challenging quadcopter mid-air docking, where two quadcopters fly in extreme proximity, and strong airflow significantly disrupts flight. However, our method can stabilize the quadcopter in this case and accomplish docking. The resources are available at https://github.com/ruiqizhang99/ProxFly.

ProxFly: Robust Control for Close Proximity Quadcopter Flight via Residual Reinforcement Learning

TL;DR

ProxFly addresses the challenge of robust close-proximity quadcopter flight under downwash disturbances by stacking a residual reinforcement learning module on top of a traditional cascaded controller. The residual policy learns to compensate for external disturbances without requiring inter-vehicle communication, using domain randomization to generalize across model variations and guiding the RL with the basic controller to improve data efficiency. The approach demonstrates significant gains in position and attitude accuracy in both simulation and real-world proximity tasks, achieving performance comparable to a complex model-based aerodynamic compensator and enabling reliable aerial docking under extreme proximity. This hybrid control framework reduces reliance on precise aerodynamics modeling, lowers data and communication requirements, and provides a practical path toward robust, multi-vehicle quadcopter operations in constrained environments.

Abstract

This paper proposes the ProxFly, a residual deep Reinforcement Learning (RL)-based controller for close proximity quadcopter flight. Specifically, we design a residual module on top of a cascaded controller (denoted as basic controller) to generate high-level control commands, which compensate for external disturbances and thrust loss caused by downwash effects from other quadcopters. First, our method takes only the ego state and controllers' commands as inputs and does not rely on any communication between quadcopters, thereby reducing the bandwidth requirement. Through domain randomization, our method relaxes the requirement for accurate system identification and fine-tuned controller parameters, allowing it to adapt to changing system models. Meanwhile, our method not only reduces the proportion of unexplainable signals from the black box in control commands but also enables the RL training to skip the time-consuming exploration from scratch via guidance from the basic controller. We validate the effectiveness of the residual module in the simulation with different proximities. Moreover, we conduct the real close proximity flight test to compare ProxFly with the basic controller and an advanced model-based controller with complex aerodynamic compensation. Finally, we show that ProxFly can be used for challenging quadcopter mid-air docking, where two quadcopters fly in extreme proximity, and strong airflow significantly disrupts flight. However, our method can stabilize the quadcopter in this case and accomplish docking. The resources are available at https://github.com/ruiqizhang99/ProxFly.
Paper Structure (13 sections, 5 equations, 5 figures, 2 tables)

This paper contains 13 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Flying quadcopters in close proximity with ProxFly. (a) Two quadcopters are tracking circular trajectories. (b) The smal quadcopter are taking off, approaching and docking with the large one in the air.
  • Figure 2: The pipeline of ProxFly. The high-level position and attitude controllers generate the desired mass-normalized thrust $\bm{c}_{des}$ and the body rate $\omega_{des}$. Meanwhile, the residual module takes the current state $s$, desired position $p_{des}$ and attitude $\eta_{des}$ and the last command from basic high-level controller. Then it generates the residual thrust $\bm{c}_{res}$ and body rates $\omega_{res}$ as a compensation of the basic controller. The overall commands can be calculated and sent to the model-based low-level controller to generate the motor speed commands. The ground truth of states are from motion capture system and the state estimator provides the estimated states to the controller.
  • Figure 3: The results of simulated experiments.First row: The external forces in the $z$ direction and torques around the $x$ and $y$ axes from the SQ at three different height differences $H = [0.25, 0.5, 0.75]$ based on the aerodynamics model in karan2019aerodynamics. Second row: The performance comparison of altitude, roll, and pitch attitude control using the basic controller (denoted as Basic Only) and ProxFly. Third row: The residual commands of mass-normalized thrust, roll rate and pitch rate.
  • Figure 4: The demonstration of circular trajectory tracking. (a) Two quadcopters are tracking the circular trajectory counterclockwise. The residual module generates positive thrust commands to compensate the thrust loss and downward force caused by downwash flow. (b) Two quadcopters are tracking in reversed directions. When the small quadcopter is passing above the large one, the controller on the large quadcopter increases the thrust for about $1s$ for compensation. (LPF: low-pass filtered)
  • Figure 5: The procedure of quadcopter mid-air docking.Stage 1: Taking off and hovering. The residual RL controller only assists the basic controller on the large quadcopter (LQ) to achieve faster responses. Stage 2: Small quadcopter approaching. Small quadcopter (SQ) hovers above the LQ and vertically approaches it, which generates strong downwash flow disturbances, and the residual RL controller generates thrust and rate compensation to help stabilize LQ's position and attitude. (LPF: low-pass filtered) Stage 3: Docked with LQ. The SQ falls freely from 5cm above the LQ, gets docked with the LQ and generates an impulse. The overall mass changes and the thrust compensation from the RL controller on LQ reaches the peak. Stage 4: SQ leaving. SQ takes off from the LQ vertically, and the downwash airflow reappears and gradually decreases while the thrust compensation from the RL controller also gradually decreases.