Table of Contents
Fetching ...

Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation

Kaiyan Xiao, Zihan Xu, Cheng Zhe, Chengju Liu, Qijun Chen

TL;DR

This work tackles the challenge of high-load humanoid loco-manipulation by decoupling control into upper- and lower-body policies and introducing a delta-command module to achieve world-frame end-effector tracking. It presents a three-stage RL framework with a heuristic, forward-kinematics-informed upper-body reward and a force-capable lower-body curriculum, enabling proactive environment interaction. A delta-command policy coordinates the three policies, stabilizing end-effector pose while preserving locomotion, and domain randomization plus a sim-to-real demonstration on the Unitree G1 show robust real-world transfer, including pushing/pulling a 112.8 kg cart. The approach advances industrial loco-manipulation by achieving stable multi-task performance with heavy payloads and dynamic disturbances, while providing a clear path for extending force directions in future work.

Abstract

Humanoid robots, with their human-like morphology, hold great potential for industrial applications. However, existing loco-manipulation methods primarily focus on dexterous manipulation, falling short of the combined requirements for dexterity and proactive force interaction in high-load industrial scenarios. To bridge this gap, we propose a reinforcement learning-based framework with a decoupled three-stage training pipeline, consisting of an upper-body policy, a lower-body policy, and a delta-command policy. To accelerate upper-body training, a heuristic reward function is designed. By implicitly embedding forward kinematics priors, it enables the policy to converge faster and achieve superior performance. For the lower body, a force-based curriculum learning strategy is developed, enabling the robot to actively exert and regulate interaction forces with the environment.

Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation

TL;DR

This work tackles the challenge of high-load humanoid loco-manipulation by decoupling control into upper- and lower-body policies and introducing a delta-command module to achieve world-frame end-effector tracking. It presents a three-stage RL framework with a heuristic, forward-kinematics-informed upper-body reward and a force-capable lower-body curriculum, enabling proactive environment interaction. A delta-command policy coordinates the three policies, stabilizing end-effector pose while preserving locomotion, and domain randomization plus a sim-to-real demonstration on the Unitree G1 show robust real-world transfer, including pushing/pulling a 112.8 kg cart. The approach advances industrial loco-manipulation by achieving stable multi-task performance with heavy payloads and dynamic disturbances, while providing a clear path for extending force directions in future work.

Abstract

Humanoid robots, with their human-like morphology, hold great potential for industrial applications. However, existing loco-manipulation methods primarily focus on dexterous manipulation, falling short of the combined requirements for dexterity and proactive force interaction in high-load industrial scenarios. To bridge this gap, we propose a reinforcement learning-based framework with a decoupled three-stage training pipeline, consisting of an upper-body policy, a lower-body policy, and a delta-command policy. To accelerate upper-body training, a heuristic reward function is designed. By implicitly embedding forward kinematics priors, it enables the policy to converge faster and achieve superior performance. For the lower body, a force-based curriculum learning strategy is developed, enabling the robot to actively exert and regulate interaction forces with the environment.

Paper Structure

This paper contains 16 sections, 15 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: System architecture of the proposed training pipeline. The diagram illustrates the integration of the upper-body policy, lower-body policy, and delta-command policy for coordinated loco-manipulation tasks. When observation data are propagated, filled circles indicate the output of data in the corresponding color, whereas hollow circles indicate data not output.
  • Figure 2: Task-Space and Imposed Constraints of the Upper Body. (a) Task-space of the upper body. The Task-spaces of the left and right end-effectors are symmetric with respect to the $x$-axis of the torso frame. The green cuboid illustrates the set of feasible positions, whereas the yellow cone indicates the feasible orientation range of the end-effector, with the aperture angle $\delta$ defining the orientation limits. (b) Illustration of the angle between the elbow link's direction and the end-effector's direction.
  • Figure 3: Comparison of training outcomes with and without the sampling strategy. Both (a) and (b) are trained using curriculum learning, but only (a) incorporates the sampling strategy. Green arrows denote target velocities, and blue arrows denote actual velocities. The target linear velocity is set to $v^{*}_{t} = 0.7$ ms, while the target force is $f^{*}_{t} = 0$ N. Without the sampling strategy, the policy tends to converge toward high-force scenarios, resulting in degraded locomotion performance under low-force conditions. In contrast, the agent trained with the sampling strategy exhibits improved generalization to varying force levels.
  • Figure 4: Comparison of end-effector height trajectories in simulation, with (red) and without (blue) the delta-command policy. The black curve represents the torso height variation over time. During data collection, the robot is commanded to move along the $x$-axis at a velocity of 0.7ms.
  • Figure 5: Position and orientation errors of the left end-effector during training for four ablation groups: optimal $\sigma$ (heuristic reward with optimal $\sigma$), poor $\sigma$ (heuristic reward with undesirable $\sigma$), w/o pose (without task-space term), and w/o joint (without joint-space term).
  • ...and 4 more figures