Visual Whole-Body Control for Legged Loco-Manipulation

Minghuan Liu; Zixuan Chen; Xuxin Cheng; Yandong Ji; Ri-Zhao Qiu; Ruihan Yang; Xiaolong Wang

Visual Whole-Body Control for Legged Loco-Manipulation

Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Ri-Zhao Qiu, Ruihan Yang, Xiaolong Wang

TL;DR

The paper addresses autonomous mobile manipulation with legged robots by introducing Visual Whole-Body Control ($VBC$), a hierarchical framework that coordinates legs and an arm through a low-level universal goal-reaching policy and a high-level visuomotor planner guided by visual inputs, trained entirely in simulation and transferred to real robots ($19$ DoF).A privileged teacher policy, which leverages rich object shape and pose information, guides learning for the high-level planner, and its knowledge is distilled into a depth-image-based visuomotor student via online imitation learning (DAgger), enabling real-world deployment with visual observations.Extensive simulations show robust pickup performance across diverse objects and heights, with ablations highlighting the benefits of depth features and the hierarchical training; real-world experiments with 14 objects validate zero-shot sim-to-real transfer and emergent retrying behaviors, outperforming baselines that lack full arm-leg coordination.Overall, $VBC$ demonstrates that coordinated whole-body control combined with vision-based task planning can significantly extend the workspace and reliability of legged loco-manipulation in varied environments.

Abstract

We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the body velocities along with the end-effector position, and a high-level policy proposing the velocities and end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments.

Visual Whole-Body Control for Legged Loco-Manipulation

TL;DR

Abstract

Visual Whole-Body Control for Legged Loco-Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (14)