Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation
Kaiyan Xiao, Zihan Xu, Cheng Zhe, Chengju Liu, Qijun Chen
TL;DR
This work tackles the challenge of high-load humanoid loco-manipulation by decoupling control into upper- and lower-body policies and introducing a delta-command module to achieve world-frame end-effector tracking. It presents a three-stage RL framework with a heuristic, forward-kinematics-informed upper-body reward and a force-capable lower-body curriculum, enabling proactive environment interaction. A delta-command policy coordinates the three policies, stabilizing end-effector pose while preserving locomotion, and domain randomization plus a sim-to-real demonstration on the Unitree G1 show robust real-world transfer, including pushing/pulling a 112.8 kg cart. The approach advances industrial loco-manipulation by achieving stable multi-task performance with heavy payloads and dynamic disturbances, while providing a clear path for extending force directions in future work.
Abstract
Humanoid robots, with their human-like morphology, hold great potential for industrial applications. However, existing loco-manipulation methods primarily focus on dexterous manipulation, falling short of the combined requirements for dexterity and proactive force interaction in high-load industrial scenarios. To bridge this gap, we propose a reinforcement learning-based framework with a decoupled three-stage training pipeline, consisting of an upper-body policy, a lower-body policy, and a delta-command policy. To accelerate upper-body training, a heuristic reward function is designed. By implicitly embedding forward kinematics priors, it enables the policy to converge faster and achieve superior performance. For the lower body, a force-based curriculum learning strategy is developed, enabling the robot to actively exert and regulate interaction forces with the environment.
