Table of Contents
Fetching ...

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Arth Shukla, Stone Tao, Hao Su

TL;DR

This paper introduces ManiSkill-HAB (MS-HAB), a GPU-accelerated, open-source benchmark for low-level home-rearrangement tasks that unifies fast, realistic simulation with HAB task suites. It provides a scalable framework including GPU-backed environments, per-object RL policies, IL baselines, and an automated trajectory labeling system to sample demonstrations under safety constraints. The work demonstrates substantial speedups over Habitat 2.0, enables extensive data generation, and furnishes a comprehensive set of baselines and ablations to study subtask success and long-horizon performance. While not asserting real-robot transfer, MS-HAB offers a practical platform to advance low-level manipulation, skill chaining, and scene-level rearrangement research at scale.

Abstract

High-quality benchmarks are the foundation for embodied AI research, enabling significant advancements in long-horizon navigation, manipulation and rearrangement tasks. However, as frontier tasks in robotics get more advanced, they require faster simulation speed, more intricate test environments, and larger demonstration datasets. To this end, we present MS-HAB, a holistic benchmark for low-level manipulation and in-home object rearrangement. First, we provide a GPU-accelerated implementation of the Home Assistant Benchmark (HAB). We support realistic low-level control and achieve over 3x the speed of prior magical grasp implementations at a fraction of the GPU memory usage. Second, we train extensive reinforcement learning (RL) and imitation learning (IL) baselines for future work to compare against. Finally, we develop a rule-based trajectory filtering system to sample specific demonstrations from our RL policies which match predefined criteria for robot behavior and safety. Combining demonstration filtering with our fast environments enables efficient, controlled data generation at scale.

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

TL;DR

This paper introduces ManiSkill-HAB (MS-HAB), a GPU-accelerated, open-source benchmark for low-level home-rearrangement tasks that unifies fast, realistic simulation with HAB task suites. It provides a scalable framework including GPU-backed environments, per-object RL policies, IL baselines, and an automated trajectory labeling system to sample demonstrations under safety constraints. The work demonstrates substantial speedups over Habitat 2.0, enables extensive data generation, and furnishes a comprehensive set of baselines and ablations to study subtask success and long-horizon performance. While not asserting real-robot transfer, MS-HAB offers a practical platform to advance low-level manipulation, skill chaining, and scene-level rearrangement research at scale.

Abstract

High-quality benchmarks are the foundation for embodied AI research, enabling significant advancements in long-horizon navigation, manipulation and rearrangement tasks. However, as frontier tasks in robotics get more advanced, they require faster simulation speed, more intricate test environments, and larger demonstration datasets. To this end, we present MS-HAB, a holistic benchmark for low-level manipulation and in-home object rearrangement. First, we provide a GPU-accelerated implementation of the Home Assistant Benchmark (HAB). We support realistic low-level control and achieve over 3x the speed of prior magical grasp implementations at a fraction of the GPU memory usage. Second, we train extensive reinforcement learning (RL) and imitation learning (IL) baselines for future work to compare against. Finally, we develop a rule-based trajectory filtering system to sample specific demonstrations from our RL policies which match predefined criteria for robot behavior and safety. Combining demonstration filtering with our fast environments enables efficient, controlled data generation at scale.

Paper Structure

This paper contains 37 sections, 1 equation, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Live-rendered frames taken from ManiSkill-HAB environments while running policy rollouts with skill chaining. Ray-tracing enabled. Full videos available on website.
  • Figure 2: Interact benchmark comparing MS-HAB (ours) with Habitat. Each data point is annotated with the number of parallel environments used. SPS and GPU memory usage for each data point are averaged over 10 seeds; error bars representing 95% CIs are plotted, but are too small to see. Thanks to GPU acceleration, MS-HAB scales parallel environments to achieve over 3x the performance of Habitat while using a fraction of the GPU memory.
  • Figure 3: Renders of low-level, whole-body control policies solving Pick, Place, Open, and Close subtasks. We render 1 512x512 image and 4 128x128 sensor images. Note the base's moving position relative to surroundings. Goal spheres are invisible to sensors. Full videos in supplementary.
  • Figure 4: Long-horizon task progressive completion rates (%) on train and validation splits averaged over 1000 episodes. Futhermore, we provide an 'upper bound' on performance based on the success rates of each subtask policy. Best viewed zoomed.
  • Figure 5: Per-object vs all-object RL success once rate (%) evaluation curves for Pick and Place policies across tasks. We run 3 seeds for each per-object policy and 3 seeds for the all-object policy. TidyHouse and PrepareGroceries involve 9 objects, while SetTable involves 2 objects. Since we group runs for different per-object policies into one curve, we use minimum and maximum for the shaded region. Best viewed zoomed.
  • ...and 6 more figures