MuBlE: MuJoCo and Blender simulation Environment and Benchmark for Task Planning in Robot Manipulation
Michal Nazarczuk, Karla Stepanova, Jan Kristof Behrens, Matej Hoffmann, Krystian Mikolajczyk
TL;DR
MuBlE addresses the challenge of developing embodied reasoning agents for long-horizon robot manipulation by providing a MuJoCo-based physics engine coupled with Blender-based photorealistic rendering within robosuite. It introduces SHOP-VRB2, a 12,000-scene multimodal benchmark demanding simultaneous visual and physical reasoning across ten tasks, plus data-generation tools for scenes, instructions, and ground-truth annotations. The authors demonstrate baselines on SHOP-VRB2 and real-world YCB scenes, showing meaningful sim-to-real transfer aided by high-fidelity rendering and accurate physics, while highlighting current difficulties in long-horizon manipulation. This work offers a scalable framework and benchmark to foster advances in closed-loop planning and multimodal understanding for robot manipulation, with practical impact on sim-to-real transfer and evaluation of embodied reasoning systems.
Abstract
Current embodied reasoning agents struggle to plan for long-horizon tasks that require to physically interact with the world to obtain the necessary information (e.g. 'sort the objects from lightest to heaviest'). The improvement of the capabilities of such an agent is highly dependent on the availability of relevant training environments. In order to facilitate the development of such systems, we introduce a novel simulation environment (built on top of robosuite) that makes use of the MuJoCo physics engine and high-quality renderer Blender to provide realistic visual observations that are also accurate to the physical state of the scene. It is the first simulator focusing on long-horizon robot manipulation tasks preserving accurate physics modeling. MuBlE can generate mutlimodal data for training and enable design of closed-loop methods through environment interaction on two levels: visual - action loop, and control - physics loop. Together with the simulator, we propose SHOP-VRB2, a new benchmark composed of 10 classes of multi-step reasoning scenarios that require simultaneous visual and physical measurements.
