Table of Contents
Fetching ...

Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning

Guanqi He, Xiaofeng Guo, Luyi Tang, Yuanhang Zhang, Mohammadreza Mousaei, Jiahe Xu, Junyi Geng, Sebastian Scherer, Guanya Shi

TL;DR

The paper addresses the need for versatile aerial manipulation by introducing an end-effector-centric framework that decouples high-level decision-making from low-level control for a fully actuated hexarotor with a 4-DoF arm. It combines an ee-centric whole-body Model Predictive Controller with online $ ext{L1}$ adaptation and two high-level policies: ee-centric teleoperation and an imitation-learning policy based on Action Chunk with Transformer (ACT). The framework demonstrates precise end-effector tracking, intuitive teleoperation, and data-efficient autonomous policy learning across tasks such as writing, peg-in-hole, pick-and-place, and light-bulb replacement, validated through extensive real-world experiments and simulations. This modular approach enables cross-embodiment policy reuse and paves the way for standardizing aerial manipulation within the broader manipulation community, with future work targeting outdoor deployment and onboard perception for obstacle avoidance. The core technical contributions include the end-effector-centric MPC with disturbance adaptation, the ee-centric teleoperation interface, and the ACT-based policy learning pipeline, all integrated on a 4-DoF arm mounted on a fully actuated hexarotor.

Abstract

Aerial manipulation has recently attracted increasing interest from both industry and academia. Previous approaches have demonstrated success in various specific tasks. However, their hardware design and control frameworks are often tightly coupled with task specifications, limiting the development of cross-task and cross-platform algorithms. Inspired by the success of robot learning in tabletop manipulation, we propose a unified aerial manipulation framework with an end-effector-centric interface that decouples high-level platform-agnostic decision-making from task-agnostic low-level control. Our framework consists of a fully-actuated hexarotor with a 4-DoF robotic arm, an end-effector-centric whole-body model predictive controller, and a high-level policy. The high-precision end-effector controller enables efficient and intuitive aerial teleoperation for versatile tasks and facilitates the development of imitation learning policies. Real-world experiments show that the proposed framework significantly improves end-effector tracking accuracy, and can handle multiple aerial teleoperation and imitation learning tasks, including writing, peg-in-hole, pick and place, changing light bulbs, etc. We believe the proposed framework provides one way to standardize and unify aerial manipulation into the general manipulation community and to advance the field. Project website: https://lecar-lab.github.io/flying_hand/.

Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning

TL;DR

The paper addresses the need for versatile aerial manipulation by introducing an end-effector-centric framework that decouples high-level decision-making from low-level control for a fully actuated hexarotor with a 4-DoF arm. It combines an ee-centric whole-body Model Predictive Controller with online adaptation and two high-level policies: ee-centric teleoperation and an imitation-learning policy based on Action Chunk with Transformer (ACT). The framework demonstrates precise end-effector tracking, intuitive teleoperation, and data-efficient autonomous policy learning across tasks such as writing, peg-in-hole, pick-and-place, and light-bulb replacement, validated through extensive real-world experiments and simulations. This modular approach enables cross-embodiment policy reuse and paves the way for standardizing aerial manipulation within the broader manipulation community, with future work targeting outdoor deployment and onboard perception for obstacle avoidance. The core technical contributions include the end-effector-centric MPC with disturbance adaptation, the ee-centric teleoperation interface, and the ACT-based policy learning pipeline, all integrated on a 4-DoF arm mounted on a fully actuated hexarotor.

Abstract

Aerial manipulation has recently attracted increasing interest from both industry and academia. Previous approaches have demonstrated success in various specific tasks. However, their hardware design and control frameworks are often tightly coupled with task specifications, limiting the development of cross-task and cross-platform algorithms. Inspired by the success of robot learning in tabletop manipulation, we propose a unified aerial manipulation framework with an end-effector-centric interface that decouples high-level platform-agnostic decision-making from task-agnostic low-level control. Our framework consists of a fully-actuated hexarotor with a 4-DoF robotic arm, an end-effector-centric whole-body model predictive controller, and a high-level policy. The high-precision end-effector controller enables efficient and intuitive aerial teleoperation for versatile tasks and facilitates the development of imitation learning policies. Real-world experiments show that the proposed framework significantly improves end-effector tracking accuracy, and can handle multiple aerial teleoperation and imitation learning tasks, including writing, peg-in-hole, pick and place, changing light bulbs, etc. We believe the proposed framework provides one way to standardize and unify aerial manipulation into the general manipulation community and to advance the field. Project website: https://lecar-lab.github.io/flying_hand/.

Paper Structure

This paper contains 34 sections, 13 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The proposed framework and system can accomplish multiple typical aerial manipulation tasks precisely and robustly, such as (a) writing "2025", (b) peg-in-hole, (c) pick-and-place, and (d) changing light bulbs.
  • Figure 2: The proposed end-effector-centric aerial manipulation framework includes the UAM platform, the ee-centric whole-body MPC, and the high-level policy including an ee-centric teleoperation interface, and an imitation learning-based framework using Action Chunk with Transformer (ACT) zhao2023learning. The high-level policy, either the human teleoperation or learned autonomous policy sends the target end-effector state to ee-centric MPC which then generates motor commands for the UAM platform to execute.
  • Figure 3: UAM hardware system design, illustrating the key components: (1) fully-actuated hexarotor as the base structure, (2) 4 Dof manipulator, (3) Intel RealSense cameras for vision-based perception and feedback, and (4) end-effector gripper for object interaction. The frame notations in the right diagram represent the coordinate axes associated with the system.
  • Figure 4: End-effector tracking performance of aerial manipulator in Ellipse trajectory. Tracking results indicate that the w.o. MPC baseline exhibits significant tracking lag, while the w.o. L1 baseline suffers from static tracking errors due to model mismatches.
  • Figure 5: End-effector tracking error distribution for three types of trajectories using our methods and two baselines.
  • ...and 8 more figures