Table of Contents
Fetching ...

HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

Zhi Jing, Siyuan Yang, Jicong Ao, Ting Xiao, Yu-Gang Jiang, Chenjia Bai

TL;DR

HumanoidGen addresses the data bottleneck in bimanual dexterous humanoid manipulation by automating task and demonstration generation through LLM-driven planning that encodes relational spatial constraints. It leverages spatial annotations for assets and hands, a constraint-based planner to produce executable motion scripts, and a STCR-MCTS framework to improve long-horizon reasoning when annotations are incomplete. The authors build HGen-Bench with 20 diverse tasks on a Unitree H1-2 platform in SAPIEN, and show that diffusion-policy training benefits from larger, more varied datasets, while MCTS enhances planning reliability and diversity. Real-world experiments and automatic asset-annotation evaluation further support the framework’s effectiveness and scalability for sim-to-real research in humanoid bimanual manipulation.

Abstract

For robotic manipulation, existing robotics datasets and simulation benchmarks predominantly cater to robot-arm platforms. However, for humanoid robots equipped with dual arms and dexterous hands, simulation tasks and high-quality demonstrations are notably lacking. Bimanual dexterous manipulation is inherently more complex, as it requires coordinated arm movements and hand operations, making autonomous data collection challenging. This paper presents HumanoidGen, an automated task creation and demonstration collection framework that leverages atomic dexterous operations and LLM reasoning to generate relational constraints. Specifically, we provide spatial annotations for both assets and dexterous hands based on the atomic operations, and perform an LLM planner to generate a chain of actionable spatial constraints for arm movements based on object affordances and scenes. To further improve planning ability, we employ a variant of Monte Carlo tree search to enhance LLM reasoning for long-horizon tasks and insufficient annotation. In experiments, we create a novel benchmark with augmented scenarios to evaluate the quality of the collected data. The results show that the performance of the 2D and 3D diffusion policies can scale with the generated dataset. Project page is https://openhumanoidgen.github.io.

HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

TL;DR

HumanoidGen addresses the data bottleneck in bimanual dexterous humanoid manipulation by automating task and demonstration generation through LLM-driven planning that encodes relational spatial constraints. It leverages spatial annotations for assets and hands, a constraint-based planner to produce executable motion scripts, and a STCR-MCTS framework to improve long-horizon reasoning when annotations are incomplete. The authors build HGen-Bench with 20 diverse tasks on a Unitree H1-2 platform in SAPIEN, and show that diffusion-policy training benefits from larger, more varied datasets, while MCTS enhances planning reliability and diversity. Real-world experiments and automatic asset-annotation evaluation further support the framework’s effectiveness and scalability for sim-to-real research in humanoid bimanual manipulation.

Abstract

For robotic manipulation, existing robotics datasets and simulation benchmarks predominantly cater to robot-arm platforms. However, for humanoid robots equipped with dual arms and dexterous hands, simulation tasks and high-quality demonstrations are notably lacking. Bimanual dexterous manipulation is inherently more complex, as it requires coordinated arm movements and hand operations, making autonomous data collection challenging. This paper presents HumanoidGen, an automated task creation and demonstration collection framework that leverages atomic dexterous operations and LLM reasoning to generate relational constraints. Specifically, we provide spatial annotations for both assets and dexterous hands based on the atomic operations, and perform an LLM planner to generate a chain of actionable spatial constraints for arm movements based on object affordances and scenes. To further improve planning ability, we employ a variant of Monte Carlo tree search to enhance LLM reasoning for long-horizon tasks and insufficient annotation. In experiments, we create a novel benchmark with augmented scenarios to evaluate the quality of the collected data. The results show that the performance of the 2D and 3D diffusion policies can scale with the generated dataset. Project page is https://openhumanoidgen.github.io.

Paper Structure

This paper contains 54 sections, 9 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: The overview of HumanoidGen. It includes spatial annotations, scene generation, constraint generation, MCTS-enhanced reasoning, data collection, scene scaling, and policy evaluation.
  • Figure 1: The evaluation results of applying different numbers of max MCTS exploration steps $N$ and non-MCTS in four tasks.
  • Figure 2: The spatial annotations, including key points and key axes for assets and hands, as well as the atomic operations of hands that include grasp, pinch, and press.
  • Figure 3: An illustration of the generated plan for the task block storage. The LLM is prompted with a task description, scene layout, and asset attributes to generate a step sequence. Each step is expressed using an atomic operation, along with its corresponding annotations. During plan execution, (i) from step 0 - 5, the left hand pulls out the drawer by grasping its handle, while the right hand simultaneously pinches and lifts the cube. (ii) From step 6 - 9, the left hand takes the cube from the right hand. (iii) From step 10 - 15, the left hand places the cube into the drawer and pushes the drawer back by grasping its handle. The LLM avoids collisions that would occur from directly moving to the pinching pose by planning a collision-free method during steps 6 - 7, demonstrating active collision avoidance. Additionally, the LLM proactively generates code to account for potential collisions with the drawer when performing free motion in the compact workspace, as illustrated in the bottom part.
  • Figure 4: HGen-Bench includes various dexterous bimanual manipulation tasks of varying difficulty. We provide different observation information and deploy the tasks in a home scene.
  • ...and 8 more figures