Table of Contents
Fetching ...

Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV

Zhikun Wang, Shiyu Zhao

TL;DR

The paper tackles sim-to-real transfer for a variable-pitch MAV controlled by reinforcement learning. It proposes an RL deployment framework that couples curriculum learning, domain randomization, cascade control, and a system twin to bridge reality gaps, enabling zero-shot deployment to real hardware. The approach uses a planar VPP MAV model, PPO-based policy with asymmetric networks, and a hierarchical control loop with a high-level RL controller and fast low-level PD/adaptive actuators, validated via flips and wall-backtracking in both simulation and real world. Results show robust real-world performance and near-simulated maneuver accuracy, demonstrating practical viability for agile MAV control with VPP. The work contributes to scalable, zero-shot sim-to-real transfer for high-fidelity aerial maneuvers.

Abstract

Reinforcement learning (RL) algorithms can enable high-maneuverability in unmanned aerial vehicles (MAVs), but transferring them from simulation to real-world use is challenging. Variable-pitch propeller (VPP) MAVs offer greater agility, yet their complex dynamics complicate the sim-to-real transfer. This paper introduces a novel RL framework to overcome these challenges, enabling VPP MAVs to perform advanced aerial maneuvers in real-world settings. Our approach includes real-to-sim transfer techniques-such as system identification, domain randomization, and curriculum learning to create robust training simulations and a sim-to-real transfer strategy combining a cascade control system with a fast-response low-level controller for reliable deployment. Results demonstrate the effectiveness of this framework in achieving zero-shot deployment, enabling MAVs to perform complex maneuvers such as flips and wall-backtracking.

Sim-to-Real Transfer in Reinforcement Learning for Maneuver Control of a Variable-Pitch MAV

TL;DR

The paper tackles sim-to-real transfer for a variable-pitch MAV controlled by reinforcement learning. It proposes an RL deployment framework that couples curriculum learning, domain randomization, cascade control, and a system twin to bridge reality gaps, enabling zero-shot deployment to real hardware. The approach uses a planar VPP MAV model, PPO-based policy with asymmetric networks, and a hierarchical control loop with a high-level RL controller and fast low-level PD/adaptive actuators, validated via flips and wall-backtracking in both simulation and real world. Results show robust real-world performance and near-simulated maneuver accuracy, demonstrating practical viability for agile MAV control with VPP. The work contributes to scalable, zero-shot sim-to-real transfer for high-fidelity aerial maneuvers.

Abstract

Reinforcement learning (RL) algorithms can enable high-maneuverability in unmanned aerial vehicles (MAVs), but transferring them from simulation to real-world use is challenging. Variable-pitch propeller (VPP) MAVs offer greater agility, yet their complex dynamics complicate the sim-to-real transfer. This paper introduces a novel RL framework to overcome these challenges, enabling VPP MAVs to perform advanced aerial maneuvers in real-world settings. Our approach includes real-to-sim transfer techniques-such as system identification, domain randomization, and curriculum learning to create robust training simulations and a sim-to-real transfer strategy combining a cascade control system with a fast-response low-level controller for reliable deployment. Results demonstrate the effectiveness of this framework in achieving zero-shot deployment, enabling MAVs to perform complex maneuvers such as flips and wall-backtracking.

Paper Structure

This paper contains 26 sections, 10 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: An illustration of the plane variable-pitch MAV achieve maneuver approach. The virtual simulation environment is built with real-world system parameters, and the real-world implementation directly employs the trained controller from the simulation.
  • Figure 2: An illustration of the plane variable-pitch MAV and the definition of the coordinate system.
  • Figure 3: An overview of the proposed VPP MAV control system.
  • Figure 4: Overview of the components and data flow for the practical planar VPP MAV system.
  • Figure 5: The experimental results compare the tracking of target thrust by a VPP actuator using different methods, where the commanded thrust is represented by the red dotted line and the response thrust is depicted by the blue solid lines. \ref{['fig_adjust_before']} shows the thrust response with polynomial regression, whereas \ref{['fig_adjust_after']} shows the thrust response with adaptive actuator controller.
  • ...and 3 more figures