What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

Jiayu Chen; Chao Yu; Yuqing Xie; Feng Gao; Yinuo Chen; Shu'ang Yu; Wenhao Tang; Shilong Ji; Mo Mu; Yi Wu; Huazhong Yang; Yu Wang

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

Jiayu Chen, Chao Yu, Yuqing Xie, Feng Gao, Yinuo Chen, Shu'ang Yu, Wenhao Tang, Shilong Ji, Mo Mu, Yi Wu, Huazhong Yang, Yu Wang

TL;DR

This work tackles the sim-to-real transfer challenge in RL-based quadrotor control by proposing SimpleFlight, a PPO-based framework that integrates five concrete factors to achieve zero-shot real-world deployment. Key innovations include an enhanced input space (velocity and rotation matrix in the actor, time vector in the critic), an action-difference based smoothness reward, selective SysID and domain randomization, and large-batch training, all validated on Crazyflie and a custom quadrotor with strong generalization across platforms. Empirical results show over a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines, successful tracking of both smooth and infeasible trajectories, and robust zero-shot performance in real hardware. By integrating SimpleFlight into Omnidrones and releasing open-source code and checkpoints, the paper provides a practical, reproducible path toward more reliable RL-based quadrotor control in real-world applications.

Abstract

Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibility. Recently, RL-based policy has emerged as a promising alternative due to its ability to directly map observations to actions, reducing the need for detailed system knowledge and actuation constraints. However, a significant challenge remains in bridging the sim-to-real gap, where RL-based policies often experience instability when deployed in real world. In this paper, we investigate key factors for learning robust RL-based control policies that are capable of zero-shot deployment in real-world quadrotors. We identify five critical factors and we develop a PPO-based training framework named SimpleFlight, which integrates these five techniques. We validate the efficacy of SimpleFlight on Crazyflie quadrotor, demonstrating that it achieves more than a 50% reduction in trajectory tracking error compared to state-of-the-art RL baselines. The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors. In contrast, baseline methods struggle with high-speed or infeasible trajectories. To support further research and reproducibility, we integrate SimpleFlight into a GPU-based simulator Omnidrones and provide open-source access to the code and model checkpoints. We hope SimpleFlight will offer valuable insights for advancing RL-based quadrotor control. For more details, visit our project website at https://sites.google.com/view/simpleflight/.

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

TL;DR

Abstract

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)