Table of Contents
Fetching ...

M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes

Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu

TL;DR

This work introduces M3Bench, a large-scale benchmark for whole-body motion generation in mobile manipulation within 3D scenes, and M3BenchMaker, an automatic data-generation tool that constructs coordinated base–arm trajectories from high-level task instructions. The framework assembles Task Builder, Conditional Scene Sampler, Goal Configuration Generator, and VKC Problem Generator to produce feasible demonstrations validated in Isaac Sim, across 119 scenes and 32 object types. Through extensive experiments comparing planning-based and learning-based methods, the authors show persistent challenges in achieving robust base–arm coordination under environmental constraints, with hybrid approaches offering limited gains. The contributions provide a scalable platform and data-generation capability to advance embodied AI toward more adaptive mobile manipulation in realistic environments.

Abstract

We propose M3Bench, a new benchmark for whole-body motion generation in mobile manipulation tasks. Given a 3D scene context, M3Bench requires an embodied agent to reason about its configuration, environmental constraints, and task objectives to generate coordinated whole-body motion trajectories for object rearrangement. M3Bench features 30,000 object rearrangement tasks across 119 diverse scenes, providing expert demonstrations generated by our newly developed M3BenchMaker, an automatic data generation tool that produces whole-body motion trajectories from high-level task instructions using only basic scene and robot information. Our benchmark includes various task splits to evaluate generalization across different dimensions and leverages realistic physics simulation for trajectory assessment. Extensive evaluation analysis reveals that state-of-the-art models struggle with coordinating base-arm motion while adhering to environmental and task-specific constraints, underscoring the need for new models to bridge this gap. By releasing M3Bench and M3BenchMaker we aim to advance robotics research toward more adaptive and capable mobile manipulation in diverse, real-world environments.

M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes

TL;DR

This work introduces M3Bench, a large-scale benchmark for whole-body motion generation in mobile manipulation within 3D scenes, and M3BenchMaker, an automatic data-generation tool that constructs coordinated base–arm trajectories from high-level task instructions. The framework assembles Task Builder, Conditional Scene Sampler, Goal Configuration Generator, and VKC Problem Generator to produce feasible demonstrations validated in Isaac Sim, across 119 scenes and 32 object types. Through extensive experiments comparing planning-based and learning-based methods, the authors show persistent challenges in achieving robust base–arm coordination under environmental constraints, with hybrid approaches offering limited gains. The contributions provide a scalable platform and data-generation capability to advance embodied AI toward more adaptive mobile manipulation in realistic environments.

Abstract

We propose M3Bench, a new benchmark for whole-body motion generation in mobile manipulation tasks. Given a 3D scene context, M3Bench requires an embodied agent to reason about its configuration, environmental constraints, and task objectives to generate coordinated whole-body motion trajectories for object rearrangement. M3Bench features 30,000 object rearrangement tasks across 119 diverse scenes, providing expert demonstrations generated by our newly developed M3BenchMaker, an automatic data generation tool that produces whole-body motion trajectories from high-level task instructions using only basic scene and robot information. Our benchmark includes various task splits to evaluate generalization across different dimensions and leverages realistic physics simulation for trajectory assessment. Extensive evaluation analysis reveals that state-of-the-art models struggle with coordinating base-arm motion while adhering to environmental and task-specific constraints, underscoring the need for new models to bridge this gap. By releasing M3Bench and M3BenchMaker we aim to advance robotics research toward more adaptive and capable mobile manipulation in diverse, real-world environments.

Paper Structure

This paper contains 15 sections, 1 equation, 3 figures, 4 tables, 2 algorithms.

Figures (3)

  • Figure 1: Illustration of whole-body motion trajectories in 3D scenes. (a) Treating the mobile base and arm as separate entities can lead to two typical failures: a nearby navigable position may be impractical for the arm to reach the object (red), and a feasible grasp pose may be unachievable due to the robot's embodiment and environmental constraints (orange). (b) Our tool generates feasible whole-body motion trajectories from high-level instructions, requiring only the action type, target object, and URDF files of the scene and robot. The green overlay illustrates a generated trajectory for the "pick that salt shaker" task.
  • Figure 2: Overview of the M${}^{3}$BenchMaker. The Task Builder allows users to specify manipulation tasks via high-level definitions using urdf, target object link, and action. The Conditional Scene Sampler augments data by generating object and robot poses (blue outline) in terms of their supporting planes (green outline) of target objects (red outline). The Goal Configuration Generator produces task-specific goal poses using a pre-trained model for grasp/placement candidates. The VKC Problem Generator constructs optimization programs for computing whole-body motion trajectories that satisfy task objectives and constraints via vkc jiao2021consolidated.
  • Figure 3: An illustration of metadata.