Table of Contents
Fetching ...

HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton

Rui Zhong, Yizhe Sun, Junjie Wen, Jinming Li, Chuang Cheng, Wei Dai, Zhiwen Zeng, Huimin Lu, Yichen Zhu, Yi Xu

TL;DR

This work tackles the data bottleneck in scalable humanoid policy learning by introducing HumanoidExo, a wearable exoskeleton system that captures human whole-body motion and translates it into robot-ready data. The authors pair this with HE-VLA, a hybrid Vision-Language-Action and reinforcement learning framework, to bootstrap learning from exoskeleton demonstrations and ensure robust balance for real-world execution on a Unitree G1 humanoid. Across three tasks, HumanoidExo data significantly improves generalization and data efficiency, enabling complex manipulation and even novel skills such as walking to be learned with as little as a few real-robot demonstrations. The results suggest that exoskeleton-driven data can substitute or complement teleoperation, offering a scalable path toward generalist humanoid policies with strong real-world applicability.

Abstract

A significant bottleneck in humanoid policy learning is the acquisition of large-scale, diverse datasets, as collecting reliable real-world data remains both difficult and cost-prohibitive. To address this limitation, we introduce HumanoidExo, a novel system that transfers human motion to whole-body humanoid data. HumanoidExo offers a high-efficiency solution that minimizes the embodiment gap between the human demonstrator and the robot, thereby tackling the scarcity of whole-body humanoid data. By facilitating the collection of more voluminous and diverse datasets, our approach significantly enhances the performance of humanoid robots in dynamic, real-world scenarios. We evaluated our method across three challenging real-world tasks: table-top manipulation, manipulation integrated with stand-squat motions, and whole-body manipulation. Our results empirically demonstrate that HumanoidExo is a crucial addition to real-robot data, as it enables the humanoid policy to generalize to novel environments, learn complex whole-body control from only five real-robot demonstrations, and even acquire new skills (i.e., walking) solely from HumanoidExo data.

HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton

TL;DR

This work tackles the data bottleneck in scalable humanoid policy learning by introducing HumanoidExo, a wearable exoskeleton system that captures human whole-body motion and translates it into robot-ready data. The authors pair this with HE-VLA, a hybrid Vision-Language-Action and reinforcement learning framework, to bootstrap learning from exoskeleton demonstrations and ensure robust balance for real-world execution on a Unitree G1 humanoid. Across three tasks, HumanoidExo data significantly improves generalization and data efficiency, enabling complex manipulation and even novel skills such as walking to be learned with as little as a few real-robot demonstrations. The results suggest that exoskeleton-driven data can substitute or complement teleoperation, offering a scalable path toward generalist humanoid policies with strong real-world applicability.

Abstract

A significant bottleneck in humanoid policy learning is the acquisition of large-scale, diverse datasets, as collecting reliable real-world data remains both difficult and cost-prohibitive. To address this limitation, we introduce HumanoidExo, a novel system that transfers human motion to whole-body humanoid data. HumanoidExo offers a high-efficiency solution that minimizes the embodiment gap between the human demonstrator and the robot, thereby tackling the scarcity of whole-body humanoid data. By facilitating the collection of more voluminous and diverse datasets, our approach significantly enhances the performance of humanoid robots in dynamic, real-world scenarios. We evaluated our method across three challenging real-world tasks: table-top manipulation, manipulation integrated with stand-squat motions, and whole-body manipulation. Our results empirically demonstrate that HumanoidExo is a crucial addition to real-robot data, as it enables the humanoid policy to generalize to novel environments, learn complex whole-body control from only five real-robot demonstrations, and even acquire new skills (i.e., walking) solely from HumanoidExo data.

Paper Structure

This paper contains 16 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Hardware overview for HumaniodExo. We integrated a Mid-360 LiDAR for acquiring exoskeleton motion odometry. For visual information acquisition, we added two wrist cameras to capture new operational perspectives and enrich environmental perception. These cameras, installed on the Dexmo force-feedback gloves, were mounted at angles identical to those of the robot’s cameras. Since the HumanoidExo system adopts a joint space control method with angle remapping, we redesigned the exoskeleton’s key parameters to match the arm length of the Unitree G1 robot. In addition, we recruited data collectors with upper-body dimensions similar to the G1’s anthropometric parameters for data collection and teleoperation, minimizing end-effector errors arising from dimensional mismatches.
  • Figure 2: The overview of the HE-VLA. The left side of the figure illustrates our data collection process and the composition of the dataset, with the primary source of training data being in-the-wild data gathered by the HumanoidExo independently of the controlled robot. The right side of the figure presents the model deployment and inference pipeline. We employ two systems to control the robot: a Vision-Language-Action (VLA) model and a Reinforcement Learning (RL) model. The VLA model generates high-level control commands and transmits them to the lower-level RL model. The RL model is then responsible for maintaining balance control and executing the joint movement commands.
  • Figure 3: The actor-critic reinforcement learning in HE-VLA. This module works in conjunction with the primary VLA model, guaranteeing the humanoid can reliably stand, squat, and walk during policy inference.
  • Figure 4: Examples for PlaceToy (Task 1), Walk & PlaceToy (Task 2), and PlaceLaundry (Task 3). We designed three tasks to showcase the effectiveness of HumanoidExo in robot skill learning: Task 1 tests dexterity, Task 2 combines locomotion and manipulation, with the mobile-manipulation data entirely collected by HumanoidExo, and Task 3 involves whole-body manipulation of the humanoid robot.
  • Figure 5: Model Generalization.(a) Model success rates. Labels A-E correspond to the robot's success rate for grasping items A-E shown in (c), and the number of trials for each experiment is 60. (b) Item placement locations for model testing.(c) Training datasets. Out-of-domain data represents items that did not appear in the Teleoperated Demonstrations but were present in the HumanoidExo Demonstrations. (d)&(e) Tasks in new environment. The robot's success rate for completing the task is represented by the label F in (a). (f) Robustness to disturbance. The robot could autonomously walk back to the table and resume the tabletop task (Task 2) after being forcibly moved away.