Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

Zhengmao He; Kun Lei; Yanjie Ze; Koushil Sreenath; Zhongyu Li; Huazhe Xu

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

Zhengmao He, Kun Lei, Yanjie Ze, Koushil Sreenath, Zhongyu Li, Huazhe Xu

TL;DR

The paper tackles legged loco-manipulation by integrating a high-level diffusion-based BC planner with a low-level PPO-based controller, enabling a quadruped to perform manipulation tasks solely with its legs. It introduces a trajectory-parameterization scheme using manipulator flags, rational Bézier curves, and SLERP-orientation, all coordinated across world/body frames to ensure robust planning and tracking. Expert demonstrations collected in parallel simulations train the planner, while a learned low-level controller handles precise end-effector tracking under varied dynamics and randomization, achieving sim-to-real transfer on a Unitree Aliengo. The approach demonstrates superior performance across nine tasks, robustness to unexpected disturbances, and efficient data usage compared to baselines like HRL and end-to-end BC/VRL. This framework advances practical, mobile, leg-based manipulation in real-world environments, reducing reliance on additional robotic arms.

Abstract

Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

TL;DR

Abstract

Paper Structure (35 sections, 2 equations, 7 figures, 4 tables)

This paper contains 35 sections, 2 equations, 7 figures, 4 tables.

Introduction
Related Work
Mobile Manipulation
Legged Locomotion
Loco-Manipulation
Hierarchical Learning Framework
Overview
Trajectory Parameterization
Manipulator Flag
Desired Position Trajectory
Desired Orientation
The Choice of Reference Frame for Parameters
Learning Visual Manipulation Planning by BC
Framework
Input
...and 20 more sections

Figures (7)

Figure 1: We present a hierarchical learning framework to learn general loco-manipulation skills for quadruped robots. The framework enables a Unitree Aliengo robot to perform diverse skills in the real-world, including lifting baskets, pressing buttons, opening doors, and closing dishwashers, all while maintaining stable locomotion over a long distance. Videos are on the \website.
Figure 2: (1) We train a control policy $\pi_{c}$ that enables an end-effector to follow curves defined by Bézier control points and weight while maintaining stable locomotion with the other three legs. (2) We use the trained controller to collect expert data. We design manipulation trajectories for different tasks and collect demonstrations through parallel simulation. (3) We use the collected expert data and diffusion-based BC to train the planner. (4) In the deployment phase, we use the Realsense D435 to obtain the point cloud, and use an external camera to locate the pose of the robot based on AprilTag apriltag2011 for trajectory parameter and point cloud coordinate system transformation. These inputs are sequentially fed into the planner and controller, enabling the robot to perform whole-body loco-manipulation tasks.
Figure 3: Expert Demonstration. For different objects and tasks, we designed expert trajectory parameters and randomized the poses of the objects. The blue gradient curve represents the demonstration trajectory, and the yellow arrow indicate the demonstration orientation.
Figure 4: Overview of the 9 loco-manipulation tasks we train our robot to accomplish. These tasks are designed to cover a large scope of the non-prehensile manipulations tasks that can be realized by the robot's leg.
Figure 5: The performance of our proposed method in the task of lifting a basket when encountering unexpected situations. At the start, the robot estimated the trajectory based on initial pose of basket. When the basket was pushed by external force, the robot quickly updated its trajectory prediction to account for the basket's new pose, enabling it to lift the displaced basket successfully.
...and 2 more figures

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

TL;DR

Abstract

Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Figures (7)