Table of Contents
Fetching ...

Learning Multi-Agent Loco-Manipulation for Long-Horizon Quadrupedal Pushing

Yuming Feng, Chuye Hong, Yaru Niu, Shiqi Liu, Yuxiang Yang, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao

TL;DR

This paper proposes a hierarchical multi-agent reinforcement learning framework with three levels of control that enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-Ton Gol robots in the real world.

Abstract

Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.

Learning Multi-Agent Loco-Manipulation for Long-Horizon Quadrupedal Pushing

TL;DR

This paper proposes a hierarchical multi-agent reinforcement learning framework with three levels of control that enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-Ton Gol robots in the real world.

Abstract

Recently, quadrupedal locomotion has achieved significant success, but their manipulation capabilities, particularly in handling large objects, remain limited, restricting their usefulness in demanding real-world applications such as search and rescue, construction, industrial automation, and room organization. This paper tackles the task of obstacle-aware, long-horizon pushing by multiple quadrupedal robots. We propose a hierarchical multi-agent reinforcement learning framework with three levels of control. The high-level controller integrates an RRT planner and a centralized adaptive policy to generate subgoals, while the mid-level controller uses a decentralized goal-conditioned policy to guide the robots toward these sub-goals. A pre-trained low-level locomotion policy executes the movement commands. We evaluate our method against several baselines in simulation, demonstrating significant improvements over baseline approaches, with 36.0% higher success rates and 24.5% reduction in completion time than the best baseline. Our framework successfully enables long-horizon, obstacle-aware manipulation tasks like Push-Cuboid and Push-T on Go1 robots in the real world.

Paper Structure

This paper contains 33 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Our proposed method enables long-horizon collaborative pushing by multiple quadrupedal robots in environments with obstacles. The high-level controller within our hierarchical MARL framework generates adaptive subgoals to guide the lower-level policies during the collaborative manipulation of large objects of varying shapes. The agents' adaptive coordination ensures smooth obstacle avoidance and successful task completion, showcasing the robustness and flexibility of our hierarchical framework.
  • Figure 2: Overview of the proposed hierarchical MARL framework for collaborative long-horizon pushing tasks by quadrupedal robots. The framework comprises three layers: a high-level controller, a mid-level controller, and a low-level controller. The high-level controller utilizes an RRT planner to generate a trajectory and an adaptive policy to assign subgoals based on the dynamic states of the environment, object, and robots. The mid-level controller employs decentralized pushing policies to convert a common subgoal into agent-specific velocity commands, which are then executed by the low-level locomotion policy on each robot. Each layer is trained independently, leveraging frozen lower-level policies.
  • Figure 3: An example of the OCB reward. The robots are encouraged to push along object's convex hull perimeter that occludes their view of the subgoal, guiding the object's motion approximately in that direction. Here, $\vec{v}_{\text{target}}$ is a unit vector directing from the object towards the subgoal, , while $\vec{v}_i$ is a unit normal vector at the closest point on the object's convex hull to robot $i$, directed inward.
  • Figure 4: Comparison between our method and the one with only the RRT planner at the high-level controller.
  • Figure 5: The success rate and completion time of different numbers of robots in the task of cylinder pushing.
  • ...and 1 more figures