Table of Contents
Fetching ...

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

Jack Zeng, Andreu Matoses Gimenez, Eugene Vinitsky, Javier Alonso-Mora, Sihao Sun

TL;DR

This work tackles decentralized, real-world coordination of multiple MAVs to manipulate a cable-suspended load in 6-DoF without inter-agent communication. It introduces a CTDE-based MARL framework (MAPPO) with a shared policy, using a mid-level ACCBR action space and an INDI-based low-level controller to achieve robust sim-to-real transfer and onboard deployment. Real-robot experiments demonstrate pose-tracking performance comparable to a centralized NMPC while maintaining low and constant inference time, and the method shows resilience to load disturbances, heterogeneous agents, and single-robot failures. The approach paves the way for scalable, robust decentralized aerial manipulation with minimal sensing and no inter-agent communications, albeit with limitations around load pose sensing and outdoor operation.

Abstract

This paper presents the first decentralized method to enable real-world 6-DoF manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles (MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train an outer-loop control policy for each MAV. Unlike state-of-the-art controllers that utilize a centralized scheme, our policy does not require global states, inter-MAV communications, nor neighboring MAV information. Instead, agents communicate implicitly through load pose observations alone, which enables high scalability and flexibility. It also significantly reduces computing costs during inference time, enabling onboard deployment of the policy. In addition, we introduce a new action space design for the MAVs using linear acceleration and body rates. This choice, combined with a robust low-level controller, enables reliable sim-to-real transfer despite significant uncertainties caused by cable tension during dynamic 3D motion. We validate our method in various real-world experiments, including full-pose control under load model uncertainties, showing setpoint tracking performance comparable to the state-of-the-art centralized method. We also demonstrate cooperation amongst agents with heterogeneous control policies, and robustness to the complete in-flight loss of one MAV. Videos of experiments: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

TL;DR

This work tackles decentralized, real-world coordination of multiple MAVs to manipulate a cable-suspended load in 6-DoF without inter-agent communication. It introduces a CTDE-based MARL framework (MAPPO) with a shared policy, using a mid-level ACCBR action space and an INDI-based low-level controller to achieve robust sim-to-real transfer and onboard deployment. Real-robot experiments demonstrate pose-tracking performance comparable to a centralized NMPC while maintaining low and constant inference time, and the method shows resilience to load disturbances, heterogeneous agents, and single-robot failures. The approach paves the way for scalable, robust decentralized aerial manipulation with minimal sensing and no inter-agent communications, albeit with limitations around load pose sensing and outdoor operation.

Abstract

This paper presents the first decentralized method to enable real-world 6-DoF manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles (MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train an outer-loop control policy for each MAV. Unlike state-of-the-art controllers that utilize a centralized scheme, our policy does not require global states, inter-MAV communications, nor neighboring MAV information. Instead, agents communicate implicitly through load pose observations alone, which enables high scalability and flexibility. It also significantly reduces computing costs during inference time, enabling onboard deployment of the policy. In addition, we introduce a new action space design for the MAVs using linear acceleration and body rates. This choice, combined with a robust low-level controller, enables reliable sim-to-real transfer despite significant uncertainties caused by cable tension during dynamic 3D motion. We validate our method in various real-world experiments, including full-pose control under load model uncertainties, showing setpoint tracking performance comparable to the state-of-the-art centralized method. We also demonstrate cooperation amongst agents with heterogeneous control policies, and robustness to the complete in-flight loss of one MAV. Videos of experiments: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl

Paper Structure

This paper contains 17 sections, 6 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Multi-MAV lifting system performing full-pose control of a cable-suspended load. Left: simulation environment used to train the decentralized outer-loop control policy. Right: policy transferred to the real system.
  • Figure 2: Overview of our method. Dotted lines indicate components only for training; dashed lines indicate those only for real-system deployment; solid lines for both. The training process involves the centralized critic (which observes the privileged global state), direct access to MAV states, and the actuator model that maps rotor speeds to thrust forces. Shared actors make decisions based on local observations, without access to other agents’ states. The output actions, namely acceleration and body rates, are tracked by a robust model-based low-level controller based on INDI.
  • Figure 3: Time series of pose tracking results comparing our method and a centralized NMPC method sun2025agilecooperativeaerialmanipulation. Our method also includes a setup with 4 MAVs.
  • Figure 4: Real-world experiments. (A) Snapshot of the test with heterogeneous agents in which one MAV is manually controlled (hacked) to pull out and push in, and the other two MAVs counteract the interference of the hacked MAV. (B) Snapshot of the test where additional load is added to the original load, and the pose error with and without such model mismatch. (C) Snapshot of the case where one MAV fails in flight and the remaining two MAVs manage to control the load.
  • Figure 5: Positional and attitude errors comparing different action spaces at test time in the Gazebo environment.
  • ...and 9 more figures