Table of Contents
Fetching ...

H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies

Geeta Chandra Raju Bethala, Hao Huang, Niraj Pudasaini, Abdullah Mohamed Ali, Shuaihang Yuan, Congcong Wen, Anthony Tzes, Yi Fang

TL;DR

This work is the first to demonstrate learned haptic guidance fused with full-body legged control for fluid human-humanoid co-manipulation for fluid human-humanoid co-manipulation.

Abstract

We present a hierarchical policy-learning framework that enables a legged humanoid to cooperatively carry extended loads with a human partner using only haptic cues for intent inference. At the upper tier, a lightweight behavior-cloning network consumes six-axis force/torque streams from dual wrist-mounted sensors and outputs whole-body planar velocity commands that capture the leader's applied forces. At the lower tier, a deep-reinforcement-learning policy, trained under randomized payloads (0-3 kg) and friction conditions in Isaac Gym and validated in MuJoCo and on a real Unitree G1, maps these high-level twists to stable, under-load joint trajectories. By decoupling intent interpretation (force -> velocity) from legged locomotion (velocity -> joints), our method combines intuitive responsiveness to human inputs with robust, load-adaptive walking. We collect training data without motion-capture or markers, only synchronized RGB video and F/T readings, employing SAM2 and WHAM to extract 3D human pose and velocity. In real-world trials, our humanoid achieves cooperative carry-and-move performance (completion time, trajectory deviation, velocity synchrony, and follower-force) on par with a blindfolded human-follower baseline. This work is the first to demonstrate learned haptic guidance fused with full-body legged control for fluid human-humanoid co-manipulation. Code and videos are available on the H2-COMPACT website.

H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies

TL;DR

This work is the first to demonstrate learned haptic guidance fused with full-body legged control for fluid human-humanoid co-manipulation for fluid human-humanoid co-manipulation.

Abstract

We present a hierarchical policy-learning framework that enables a legged humanoid to cooperatively carry extended loads with a human partner using only haptic cues for intent inference. At the upper tier, a lightweight behavior-cloning network consumes six-axis force/torque streams from dual wrist-mounted sensors and outputs whole-body planar velocity commands that capture the leader's applied forces. At the lower tier, a deep-reinforcement-learning policy, trained under randomized payloads (0-3 kg) and friction conditions in Isaac Gym and validated in MuJoCo and on a real Unitree G1, maps these high-level twists to stable, under-load joint trajectories. By decoupling intent interpretation (force -> velocity) from legged locomotion (velocity -> joints), our method combines intuitive responsiveness to human inputs with robust, load-adaptive walking. We collect training data without motion-capture or markers, only synchronized RGB video and F/T readings, employing SAM2 and WHAM to extract 3D human pose and velocity. In real-world trials, our humanoid achieves cooperative carry-and-move performance (completion time, trajectory deviation, velocity synchrony, and follower-force) on par with a blindfolded human-follower baseline. This work is the first to demonstrate learned haptic guidance fused with full-body legged control for fluid human-humanoid co-manipulation. Code and videos are available on the H2-COMPACT website.

Paper Structure

This paper contains 24 sections, 22 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Real‐world human–humanoid co‐manipulation. The human leads the humanoid robot—unaware of the route or goal—guiding it via haptic cues to jointly carry the load along the dashed path.
  • Figure 2: H²-COMPACT's pipeline: raw force/torque and RGB inputs are cleaned by SAM2 and WHAM, then passed through a diffusion-based haptic intent model to generate high-level velocities, which a PPO-trained policy converts into humanoid joint commands.
  • Figure 3: Overview of experimental hardware. From left to right: (a) leader–follower rig with follower’s computation backpack; (b) custom instrumented box, 3D-printed handles and ATI Mini45 F/T sensors; (c) on-robot deployment, ATI Mini45 sensors mounted on the Unitree G1 wrists; (d) custom Vicon marker mounts for motion-capture evaluation.
  • Figure 4: The eight motion primitives executed during dyadic data collection
  • Figure 5: Position tracking of the humanoid and human followers relative to the leader in both scenarios, along with the corresponding position error between the leader and each follower. Top: Motion along the x-axis. Bottom: Motion along the y-axis.
  • ...and 2 more figures