Table of Contents
Fetching ...

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

Jiayu Chen, Chao Yu, Yuqing Xie, Feng Gao, Yinuo Chen, Shu'ang Yu, Wenhao Tang, Shilong Ji, Mo Mu, Yi Wu, Huazhong Yang, Yu Wang

TL;DR

This work tackles the sim-to-real transfer challenge in RL-based quadrotor control by proposing SimpleFlight, a PPO-based framework that integrates five concrete factors to achieve zero-shot real-world deployment. Key innovations include an enhanced input space (velocity and rotation matrix in the actor, time vector in the critic), an action-difference based smoothness reward, selective SysID and domain randomization, and large-batch training, all validated on Crazyflie and a custom quadrotor with strong generalization across platforms. Empirical results show over a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines, successful tracking of both smooth and infeasible trajectories, and robust zero-shot performance in real hardware. By integrating SimpleFlight into Omnidrones and releasing open-source code and checkpoints, the paper provides a practical, reproducible path toward more reliable RL-based quadrotor control in real-world applications.

Abstract

Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

TL;DR

This work tackles the sim-to-real transfer challenge in RL-based quadrotor control by proposing SimpleFlight, a PPO-based framework that integrates five concrete factors to achieve zero-shot real-world deployment. Key innovations include an enhanced input space (velocity and rotation matrix in the actor, time vector in the critic), an action-difference based smoothness reward, selective SysID and domain randomization, and large-batch training, all validated on Crazyflie and a custom quadrotor with strong generalization across platforms. Empirical results show over a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines, successful tracking of both smooth and infeasible trajectories, and robust zero-shot performance in real hardware. By integrating SimpleFlight into Omnidrones and releasing open-source code and checkpoints, the paper provides a practical, reproducible path toward more reliable RL-based quadrotor control in real-world applications.

Abstract

Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.

Paper Structure

This paper contains 27 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of SimpleFlight. We begin with SysID and selective DR for quadrotor dynamics and low-level control. Next, an RL policy is trained in simulation to output CTBR for tracking arbitrary trajectories and zero-shot deployed directly on a real quadrotor. The training framework focuses on three key aspects, i.e., input space design, reward design, system identification and domain randomization, as well as training techniques, identifying five critical factors to enhance zero-shot deployment.
  • Figure 2: Visualization of benchmark trajectories and corresponding trajectories followed using SimpleFlight. The reference trajectories are shown in black.
  • Figure 3: The tracking performance on figure-eight with 10 laps.
  • Figure 5: Real-world performance of different $\lambda$ on the figure-eight trajectory. We finally choose $\lambda = 0.4$.
  • Figure 6: Effect of batch sizes on tracking performance on figure-eight trajectories. Increasing the batch size enhances real-world performance as simulation performance converges, with real-world results also stabilizing as the batch size grows further.