MLM: Learning Multi-task Loco-Manipulation Whole-Body Control for Quadruped Robot with Arm
Xin Liu, Bida Ma, Chenkun Qi, Yan Ding, Nuo Xu, Zhaxizhuoma, Guorong Zhang, Pengan Chen, Kehui Liu, Zhongjie Jia, Chuyue Guan, Yule Mo, Jiaqi Liu, Feng Gao, Jiangwei Zhong, Bin Zhao, Xuelong Li
TL;DR
This work tackles multi-task whole-body loco-manipulation for a quadruped robot with an arm by unifying real-world and simulation data within a reinforcement learning framework. It introduces a trajectory library with adaptive curriculum-based sampling and a Trajectory-Velocity Prediction network to handle historical-observation deployment and cross-task spatial differences, achieving zero-shot sim-to-real transfer. The approach combines an asymmetrical actor-critic policy within a POMDP, terrain curricula, and domain randomization, and it is complemented by a diffusion-policy deployment option. Empirical results in simulation and real-world experiments demonstrate effective multi-task execution, robustness to prediction errors, and scalable integration of new tasks, with concrete demonstrations on a Go2 robot and Airbot arm.
Abstract
Whole-body loco-manipulation for quadruped robots with arms remains a challenging problem, particularly in achieving multi-task control. To address this, we propose MLM, a reinforcement learning framework driven by both real-world and simulation data. It enables a six-DoF robotic arm-equipped quadruped robot to perform whole-body loco-manipulation for multiple tasks autonomously or under human teleoperation. To address the problem of balancing multiple tasks during the learning of loco-manipulation, we introduce a trajectory library with an adaptive, curriculum-based sampling mechanism. This approach allows the policy to efficiently leverage real-world collected trajectories for learning multi-task loco-manipulation. To address deployment scenarios with only historical observations and to enhance the performance of policy execution across tasks with different spatial ranges, we propose a Trajectory-Velocity Prediction policy network. It predicts unobservable future trajectories and velocities. By leveraging extensive simulation data and curriculum-based rewards, our controller achieves whole-body behaviors in simulation and zero-shot transfer to real-world deployment. Ablation studies in simulation verify the necessity and effectiveness of our approach, while real-world experiments on a Go2 robot with an Airbot robotic arm demonstrate the policy's good performance in multi-task execution.
