Table of Contents
Fetching ...

MLM: Learning Multi-task Loco-Manipulation Whole-Body Control for Quadruped Robot with Arm

Xin Liu, Bida Ma, Chenkun Qi, Yan Ding, Nuo Xu, Zhaxizhuoma, Guorong Zhang, Pengan Chen, Kehui Liu, Zhongjie Jia, Chuyue Guan, Yule Mo, Jiaqi Liu, Feng Gao, Jiangwei Zhong, Bin Zhao, Xuelong Li

TL;DR

This work tackles multi-task whole-body loco-manipulation for a quadruped robot with an arm by unifying real-world and simulation data within a reinforcement learning framework. It introduces a trajectory library with adaptive curriculum-based sampling and a Trajectory-Velocity Prediction network to handle historical-observation deployment and cross-task spatial differences, achieving zero-shot sim-to-real transfer. The approach combines an asymmetrical actor-critic policy within a POMDP, terrain curricula, and domain randomization, and it is complemented by a diffusion-policy deployment option. Empirical results in simulation and real-world experiments demonstrate effective multi-task execution, robustness to prediction errors, and scalable integration of new tasks, with concrete demonstrations on a Go2 robot and Airbot arm.

Abstract

Whole-body loco-manipulation for quadruped robots with arms remains a challenging problem, particularly in achieving multi-task control. To address this, we propose MLM, a reinforcement learning framework driven by both real-world and simulation data. It enables a six-DoF robotic arm-equipped quadruped robot to perform whole-body loco-manipulation for multiple tasks autonomously or under human teleoperation. To address the problem of balancing multiple tasks during the learning of loco-manipulation, we introduce a trajectory library with an adaptive, curriculum-based sampling mechanism. This approach allows the policy to efficiently leverage real-world collected trajectories for learning multi-task loco-manipulation. To address deployment scenarios with only historical observations and to enhance the performance of policy execution across tasks with different spatial ranges, we propose a Trajectory-Velocity Prediction policy network. It predicts unobservable future trajectories and velocities. By leveraging extensive simulation data and curriculum-based rewards, our controller achieves whole-body behaviors in simulation and zero-shot transfer to real-world deployment. Ablation studies in simulation verify the necessity and effectiveness of our approach, while real-world experiments on a Go2 robot with an Airbot robotic arm demonstrate the policy's good performance in multi-task execution.

MLM: Learning Multi-task Loco-Manipulation Whole-Body Control for Quadruped Robot with Arm

TL;DR

This work tackles multi-task whole-body loco-manipulation for a quadruped robot with an arm by unifying real-world and simulation data within a reinforcement learning framework. It introduces a trajectory library with adaptive curriculum-based sampling and a Trajectory-Velocity Prediction network to handle historical-observation deployment and cross-task spatial differences, achieving zero-shot sim-to-real transfer. The approach combines an asymmetrical actor-critic policy within a POMDP, terrain curricula, and domain randomization, and it is complemented by a diffusion-policy deployment option. Empirical results in simulation and real-world experiments demonstrate effective multi-task execution, robustness to prediction errors, and scalable integration of new tasks, with concrete demonstrations on a Go2 robot and Airbot arm.

Abstract

Whole-body loco-manipulation for quadruped robots with arms remains a challenging problem, particularly in achieving multi-task control. To address this, we propose MLM, a reinforcement learning framework driven by both real-world and simulation data. It enables a six-DoF robotic arm-equipped quadruped robot to perform whole-body loco-manipulation for multiple tasks autonomously or under human teleoperation. To address the problem of balancing multiple tasks during the learning of loco-manipulation, we introduce a trajectory library with an adaptive, curriculum-based sampling mechanism. This approach allows the policy to efficiently leverage real-world collected trajectories for learning multi-task loco-manipulation. To address deployment scenarios with only historical observations and to enhance the performance of policy execution across tasks with different spatial ranges, we propose a Trajectory-Velocity Prediction policy network. It predicts unobservable future trajectories and velocities. By leveraging extensive simulation data and curriculum-based rewards, our controller achieves whole-body behaviors in simulation and zero-shot transfer to real-world deployment. Ablation studies in simulation verify the necessity and effectiveness of our approach, while real-world experiments on a Go2 robot with an Airbot robotic arm demonstrate the policy's good performance in multi-task execution.

Paper Structure

This paper contains 12 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The robot showcases its ability to perform multi-task whole-body loco-manipulation in the real world.
  • Figure 2: Overview of the pipeline. We use RL to train a whole-body control policy. A trajectory library, together with an adaptive sampling curriculum, supplies multiple real-world manipulation trajectories for the policy to track. An NAE encodes historical trajectories to estimate future targets, while the policy can also take future trajectories directly, supporting two deployment modes: teleoperation and automatic DP. In addition, we train a supervised estimator to predict the base linear velocity $\hat{v} _{t}^{p}$.
  • Figure 3: A multi-computer, multi-thread system deployment architecture. The host computer captures the handheld gripper's trajectory (100Hz) or runs the DP (10Hz). The trajectory is sent to the lower computer via Ethernet. The Jetson Orin NX, mounted on the robot, handles multi-thread tasks, including receiving robot states (500Hz), whole-body policy inference (50Hz), and sending control commands (200Hz).
  • Figure 4:
  • Figure 5: Policy network ablation analysis: (a) The 'unplug charger' and 'pushing' trajectories in simulation. (b) Arm joint torque and position curves for TVP and sMLP during the tasks, with sMLP showing more jitter. (c) MSE heatmap of future trajectory reconstruction by NAE for both tasks.
  • ...and 1 more figures