Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

Ruiqian Nai; Boyuan Zheng; Junming Zhao; Haodong Zhu; Sicong Dai; Zunhao Chen; Yihang Hu; Yingdong Hu; Tong Zhang; Chuan Wen; Yang Gao

Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

Ruiqian Nai, Boyuan Zheng, Junming Zhao, Haodong Zhu, Sicong Dai, Zunhao Chen, Yihang Hu, Yingdong Hu, Tong Zhang, Chuan Wen, Yang Gao

TL;DR

Humanoid whole-body manipulation remains data-inefficient under teleoperation or sim-to-real RL. HuMI introduces robot-free demonstrations via portable hardware and a hierarchical learning pipeline, combining a diffusion-based high-level policy with a manipulation-centric low-level controller and IK-aware data collection. The approach delivers a first robot-free humanoid whole-body demonstration system, achieves up to $3$× data-throughput improvements over teleoperation, and shows $70\%$ success in unseen environments across five tasks, with robust generalization to unseen objects. By integrating IK previews, adaptive end-effector tracking, and a carefully designed policy interface, HuMI enables broad, coordinated, and high-precision whole-body skills that generalize beyond controlled lab settings.

Abstract

Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex reward engineering. Consequently, demonstrated autonomous skills remain limited and are typically restricted to controlled environments. In this paper, we present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks across various environments. HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware. This data drives a hierarchical learning pipeline that translates human motions into dexterous and feasible humanoid skills. Extensive experiments across five whole-body tasks--including kneeling, squatting, tossing, walking, and bimanual manipulation--demonstrate that HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.

Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

TL;DR

× data-throughput improvements over teleoperation, and shows

success in unseen environments across five tasks, with robust generalization to unseen objects. By integrating IK previews, adaptive end-effector tracking, and a carefully designed policy interface, HuMI enables broad, coordinated, and high-precision whole-body skills that generalize beyond controlled lab settings.

Abstract

Paper Structure (34 sections, 6 equations, 13 figures, 3 tables)

This paper contains 34 sections, 6 equations, 13 figures, 3 tables.

Introduction
Method
Robot-Free Demonstration System
Manipulation-Centric Whole-Body Controller
Policy Interface for Improved System Integration
Experiments
Whole-Body Manipulation Capability
Learning Feasible Whole-Body Skills from Robot-Free Demonstrations
Precise Humanoid Bimanual Manipulation
Temporally Coherent Dynamic Control
Long-Range Loco-Manipulation
Generalization Ability
Data Collection Efficiency
Related Works
Conclusions and Limitations
...and 19 more sections

Figures (13)

Figure 1: Humanoid Manipulation Interface (HuMI). Left: Our portable, robot-free data collection facilitates skill transfer from human to humanoid across diverse, unstructured environments. Right: The framework enables a wide repertoire of complex whole-body behaviors.
Figure 2: Overview of the HuMI data collection system. (a) Challenges: Relying solely on gripper poses under-specifies whole-body motion, leading to unnatural postures (top); meanwhile, naively scaling human motions to match the robot's size compromises the spatial alignment required for object interaction (bottom). (b) Hardware Setup: Our portable system utilizes handheld sensorized grippers and trackers on the grippers, waist, and feet. A real-time IK preview interface enables human-in-the-loop kinematic adaptation. (c) Data Processing: Collected data serves two purposes: visual observations and task-space SE(3) trajectories train the high-level policy, while whole-body IK solutions provide reference motions for the low-level controller.
Figure 3: Hierarchical control framework of HuMI. (1) A high-level Diffusion Policy (5Hz) processes camera images and proprioception to generate receding-horizon task-space trajectories (action chunks). (2) A low-level Whole-Body Controller (50Hz) tracks these keypoint targets $p_t$, integrating the current robot state $s_t$ (IMU, joint positions/velocities) to compute precise joint actuation commands $a_t$.
Figure 4: Impact of reference frame selection on action chunk continuity. Due to tracking error, the executed robot pose (dark gray) "lags" behind the scheduled target (light gray). Naively anchoring the next action chunk to the current executed pose results in a sudden trajectory reversal (red line), disrupting momentum. By instead using the previous scheduled target as the reference frame, the policy produces a smooth, continuous trajectory (green line) that maintains the intended motion profile.
Figure 5: Mitigating drift in non-vision-grounded keypoints.Left: Trajectories during a doll-grasping task. The "sighted" gripper (green) remains anchored via visual feedback, whereas the "blind" pelvis (red) suffers from open-loop drift ($>5$ cm) over time. Right: Decomposition of the action chunk at time $t$. Because the absolute height (left axis) is corrupted by cumulative error, we discard absolute tracking in favor of relative transforms within the chunk (right axis).
...and 8 more figures

Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

TL;DR

Abstract

Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Figures (13)