HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Carmelo Sferrazza; Dun-Ming Huang; Xingyu Lin; Youngwoon Lee; Pieter Abbeel

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel

TL;DR

HumanoidBench introduces a large-scale simulated humanoid benchmark with two dexterous hands, integrating 15 whole-body manipulation and 12 locomotion tasks to probe learning for high-dimensional, coordinated control. The study benchmarks multiple RL methods and demonstrates that flat, end-to-end learning struggles on most tasks, while hierarchical reinforcement learning with robust low-level skills can achieve stronger performance. By combining novel tactile sensing, egocentric vision, and diverse task families, the platform exposes core challenges in long-horizon planning and multi-limb coordination, guiding future algorithmic development. The open-source environment and extensive ablations offer a valuable testbed for advancing humanoid locomotion and manipulation research, with potential for sim-to-real extensions and multimodal perception studies.

Abstract

Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology. However, research in humanoid robots is often bottlenecked by the costly and fragile hardware setups. To accelerate algorithmic research in humanoid robots, we present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands and a variety of challenging whole-body manipulation and locomotion tasks. Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies, such as walking or reaching. With HumanoidBench, we provide the robotics community with a platform to identify the challenges arising when solving diverse tasks with humanoid robots, facilitating prompt verification of algorithms and ideas. The open-source code is available at https://humanoid-bench.github.io.

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

TL;DR

Abstract

Paper Structure (51 sections, 33 equations, 11 figures, 6 tables)

This paper contains 51 sections, 33 equations, 11 figures, 6 tables.

Introduction
Related Work
Simulated Humanoid Robot Environment
HumanoidBench
Locomotion Tasks
Whole-Body Manipulation Tasks
Benchmarking Results
Baselines
Results
With Hands vs. Alternative Configurations
Flat vs. Hierarchical Reinforcement Learning
Common Failures
Conclusion
Additional Components
Simulated Environment Details
...and 36 more sections

Figures (11)

Figure 1: Example egocentric visual (top-left) and whole-body tactile (right) observations when the humanoid interacts with a package in the truck environment. In the right figure, the two cameras on the robot head are highlighted in green, while continuous tactile pressure readings are indicated in shades of red (strong pressure) and yellow (mild pressure). Note that for ease of visualization, we are not showing shear forces and tactile readings on the back of the robot, which are also implemented in our environment.
Figure 2: HumanoidBench manipulation task suite. We devise $15$ benchmarking whole-body manipulation tasks that cover a wide variety of interactions and difficulties. This figure illustrates an initial state for each task (left) and examples of the robot performing such tasks (right).
Figure 3: HumanoidBench locomotion task suite. We devise $12$ benchmarking locomotion tasks that cover a wide variety of interactions and difficulties. This figure illustrates an initial state for each task (left) and examples of the robot performing such tasks (right).
Figure 4: Learning curves of RL algorithms (locomotion). The curves are averaged over three random seeds and the shaded regions represent the standard deviation. Returns are computed by summing the rewards at all timesteps of an episode. The dashed lines qualitatively indicate task success. We run PPO on the walk task but it is not visible in the plot since it only achieves very low returns.
Figure 5: Learning curves of RL algorithms (manipulation). The curves are averaged over three random seeds and the shaded regions represent the standard deviation. The dashed lines qualitatively indicate task success. Note that kitchen is the only environment with a purely discrete, sparse reward, with a maximum of $4$.
...and 6 more figures

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

TL;DR

Abstract

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)