Table of Contents
Fetching ...

Pro-HOI: Perceptive Root-guided Humanoid-Object Interaction

Yuhang Lin, Jiyuan Shi, Dewei Wang, Jipeng Kong, Yong Liu, Chenjia Bai, Xuelong Li

TL;DR

This work introduces Perceptive Root-guided Humanoid-Object Interaction, Pro-HOI, a generalizable framework for robust humanoid loco-manipulation, and proposes a novel training framework that conditions the policy on a desired root-trajectory while utilizing reference motion exclusively as a reward.

Abstract

Executing reliable Humanoid-Object Interaction (HOI) tasks for humanoid robots is hindered by the lack of generalized control interfaces and robust closed-loop perception mechanisms. In this work, we introduce Perceptive Root-guided Humanoid-Object Interaction, Pro-HOI, a generalizable framework for robust humanoid loco-manipulation. First, we collect box-carrying motions that are suitable for real-world deployment and optimize penetration artifacts through a Signed Distance Field loss. Second, we propose a novel training framework that conditions the policy on a desired root-trajectory while utilizing reference motion exclusively as a reward. This design not only eliminates the need for intricate reward tuning but also establishes root trajectory as a universal interface for high-level planners, enabling simultaneous navigation and loco-manipulation. Furthermore, to ensure operational reliability, we incorporate a persistent object estimation module. By fusing real-time detection with Digital Twin, this module allows the robot to autonomously detect slippage and trigger re-grasping maneuvers. Empirical validation on a Unitree G1 robot demonstrates that Pro-HOI significantly outperforms baselines in generalization and robustness, achieving reliable long-horizon execution in complex real-world scenarios.

Pro-HOI: Perceptive Root-guided Humanoid-Object Interaction

TL;DR

This work introduces Perceptive Root-guided Humanoid-Object Interaction, Pro-HOI, a generalizable framework for robust humanoid loco-manipulation, and proposes a novel training framework that conditions the policy on a desired root-trajectory while utilizing reference motion exclusively as a reward.

Abstract

Executing reliable Humanoid-Object Interaction (HOI) tasks for humanoid robots is hindered by the lack of generalized control interfaces and robust closed-loop perception mechanisms. In this work, we introduce Perceptive Root-guided Humanoid-Object Interaction, Pro-HOI, a generalizable framework for robust humanoid loco-manipulation. First, we collect box-carrying motions that are suitable for real-world deployment and optimize penetration artifacts through a Signed Distance Field loss. Second, we propose a novel training framework that conditions the policy on a desired root-trajectory while utilizing reference motion exclusively as a reward. This design not only eliminates the need for intricate reward tuning but also establishes root trajectory as a universal interface for high-level planners, enabling simultaneous navigation and loco-manipulation. Furthermore, to ensure operational reliability, we incorporate a persistent object estimation module. By fusing real-time detection with Digital Twin, this module allows the robot to autonomously detect slippage and trigger re-grasping maneuvers. Empirical validation on a Unitree G1 robot demonstrates that Pro-HOI significantly outperforms baselines in generalization and robustness, achieving reliable long-horizon execution in complex real-world scenarios.
Paper Structure (46 sections, 9 equations, 5 figures, 8 tables)

This paper contains 46 sections, 9 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overview of Pro-HOI. a) Data Preparation: We frist collect human motion clips using the mocap system, and augment them with object geometries in Blender. Then we use SDF-based optimization to generate physically feasible reference motions. b) Root-Guided Policy Learning: The RL policy is trained to perform whole-body interaction skills conditioned on the desired root trajectory, while utilizing the reference motion as rewards. c) Real-world Deployment: We integrate FoundationPose for 6D object pose estimation and FAST-LIO2 xu2022fast for root pose estimation, combined with the Interaction Pose Prior and a task specific planner to generate target root trajectories. The full stack executes entirely onboard a Jetson NX with a D435i camera and a Mid-360 LiDAR, achieving robust sim-to-real transfer.
  • Figure 2: Spatial distribution of grasp success rates across the Out-of-Distribution (OOD) evaluation set. The color scale indicates the success rate, where darker green denotes higher stability and red/yellow indicates failure.
  • Figure 3: Statistical distribution of placement precision across different methods.
  • Figure 4: The percentages quantify the likelihood of successfully detecting an object dropped within each respective region. (a) The detection probability distribution simulated via the object estimate module. (b) The baseline detection probability distribution relying exclusively on the onboard camera.
  • Figure 5: The example of trajectory generation with contact information and simulated tracking contact.