Table of Contents
Fetching ...

Towards Task-Oriented Flying: Framework, Infrastructure, and Principles

Kangyao Huang, Hao Wang, Jingyu Chen, Jintao Chen, Yu Luo, Di Guo, Xiangkui Zhang, Xiangyang Ji, Huaping Liu

TL;DR

The paper presents a task-oriented end-to-end DRL framework for quadrotors, combined with an open-source, full-stack infrastructure (AirGym, AirGym-Real, rlPx4Controller) to enable rapid training and zero-shot sim-to-real deployment. It codifies design principles for task specification, perception, and training, and validates with four tasks—tracking, obstacle avoidance, high-speed maneuvers, and forest navigation—demonstrating robust performance under real-world disturbances. Key contributions include a principled framework linking simulation and hardware deployment, a scalable workflow for training and transfer, and detailed strategies for system identification, hover throttle learning, action continuity, trajectory guidance, and domain randomization. The work lowers entry barriers for practitioners to deploy learning-based controllers on aerial robots and provides a practical foundation for autonomously navigating dynamic, unstructured environments.

Abstract

Deploying robot learning methods to aerial robots in unstructured environments remains both challenging and promising. While recent advances in deep reinforcement learning (DRL) have enabled end-to-end flight control, the field still lacks systematic design guidelines and a unified infrastructure to support reproducible training and real-world deployment. We present a task-oriented framework for end-to-end DRL in quadrotors that integrates design principles for complex task specification and reveals the interdependencies among simulated task definition, training design principles, and physical deployment. Our framework involves software infrastructure, hardware platforms, and open-source firmware to support a full-stack learning infrastructure and workflow. Extensive empirical results demonstrate robust flight and sim-to-real generalization under real-world disturbances. By reducing the entry barrier for deploying learning-based controllers on aerial robots, our work lays a practical foundation for advancing autonomous flight in dynamic and unstructured environments.

Towards Task-Oriented Flying: Framework, Infrastructure, and Principles

TL;DR

The paper presents a task-oriented end-to-end DRL framework for quadrotors, combined with an open-source, full-stack infrastructure (AirGym, AirGym-Real, rlPx4Controller) to enable rapid training and zero-shot sim-to-real deployment. It codifies design principles for task specification, perception, and training, and validates with four tasks—tracking, obstacle avoidance, high-speed maneuvers, and forest navigation—demonstrating robust performance under real-world disturbances. Key contributions include a principled framework linking simulation and hardware deployment, a scalable workflow for training and transfer, and detailed strategies for system identification, hover throttle learning, action continuity, trajectory guidance, and domain randomization. The work lowers entry barriers for practitioners to deploy learning-based controllers on aerial robots and provides a practical foundation for autonomously navigating dynamic, unstructured environments.

Abstract

Deploying robot learning methods to aerial robots in unstructured environments remains both challenging and promising. While recent advances in deep reinforcement learning (DRL) have enabled end-to-end flight control, the field still lacks systematic design guidelines and a unified infrastructure to support reproducible training and real-world deployment. We present a task-oriented framework for end-to-end DRL in quadrotors that integrates design principles for complex task specification and reveals the interdependencies among simulated task definition, training design principles, and physical deployment. Our framework involves software infrastructure, hardware platforms, and open-source firmware to support a full-stack learning infrastructure and workflow. Extensive empirical results demonstrate robust flight and sim-to-real generalization under real-world disturbances. By reducing the entry barrier for deploying learning-based controllers on aerial robots, our work lays a practical foundation for advancing autonomous flight in dynamic and unstructured environments.

Paper Structure

This paper contains 37 sections, 12 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: A DRL-based sim-to-real task-oriented framework. The left gear serves as the driving gear and represents simulation-related items, including simulator core, task principles middle layer, and sim-to-real techniques outermost layer. The right gear is the driven layer, involving hardware devices as inner layer, as well as firmware and software as outer layer.
  • Figure 2: The proposed pipeline and the relationships between each part. In (a) AirGym, we provide four classical tasks: tracking, avoidance, target hitting, and planning. (b) illustrates the sensing processing and features fusion during the training phase. The left image in (b) shows visual input is compressed into a vector using VAEs. The depth image is reconstructed by using self-supervised learning, and the encoding part is used as the image encoder in the DRL loop. The right image in (b) illustrates ESDF distance which is used as an efficient spatial description, where we use the minimum value in depth array is used as the ESDF distance. (c) shows all different control modes that can be selected. (d) is the onboard inference module that runs on the flight platform. It is worth mentioning that the VIO part is not indispensable in our experiments because global states estimation is not necessary in our Ego-centric design. (e) builds the bridge between PX4 flight controller and the algorithm. Finally, bottom-left device is our flight platform X152b.
  • Figure 3: (a)(b) show the tracking performance under different conditions. We set a lemniscate trajectory as tracking reference and illustrate the cumulative flight path under indoor no-wind and outdoor windy environments. (c) is dynamic obstacle avoidance task by end-to-end DRL Sim-to-Real using depth sensing. The quadrotor with blue halo tries to dodge a fast flying football. Our experiment is conducted in a wild GNSS-deny environment and only relies on onboard sensing for perception and localization. The speed of flying ball reaches about $15 \mathrm{m\cdot s^{-1}}$. (d) shows the generalization experiments by using different throwing objects with various speed. One hundred episodes are evaluated for each experiment and success rate are recorded.
  • Figure 4: High-speed target hitting results. The blue halo is the quadrotor X152b with a strip light. (a) shows the high speed tracking with a sigmoid reference. (b) shows a target hitting task that accelerates from $0\mathrm{m\cdot s^{-1}}$ and hits a virtual target.
  • Figure 5: Navigating to fly out of the woods by task Planning sim-to-real. The sub-figure at the right bottom is generated using depth image after experiments and only for terrain illustration.
  • ...and 4 more figures