Table of Contents
Fetching ...

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

Rui Li, Zixuan Hu, Wenxi Qu, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, Wanli Ouyang, Lei Bai, Wangmeng Zuo, Ling-Yu Duan, Dongzhan Zhou, Shixiang Tang

TL;DR

LabUtopia delivers a lab-specialized simulation and benchmark suite by fusing LabSim (chemical-aware physics), LabScene (procedural lab environments), and LabBench (five-level hierarchical tasks). The platform enables large-scale training and principled evaluation of embodied agents in scientific tasks, and experiments reveal persistent generalization and long-horizon planning challenges for current imitation-learning methods. By providing diverse assets, a reaction-aware physics engine, and structured evaluation, LabUtopia offers a rigorous testbed for perception-planning-control integration in automated laboratories. The work highlights the sim-to-real gap and the need for more diverse embodied architectures to achieve robust scientific automation. It thus sets the stage for targeted advances in generalizable, reasoning-capable lab agents and broader adoption of sim-to-real pipelines in chemistry and materials research.

Abstract

Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

TL;DR

LabUtopia delivers a lab-specialized simulation and benchmark suite by fusing LabSim (chemical-aware physics), LabScene (procedural lab environments), and LabBench (five-level hierarchical tasks). The platform enables large-scale training and principled evaluation of embodied agents in scientific tasks, and experiments reveal persistent generalization and long-horizon planning challenges for current imitation-learning methods. By providing diverse assets, a reaction-aware physics engine, and structured evaluation, LabUtopia offers a rigorous testbed for perception-planning-control integration in automated laboratories. The work highlights the sim-to-real gap and the need for more diverse embodied architectures to achieve robust scientific automation. It thus sets the stage for targeted advances in generalizable, reasoning-capable lab agents and broader adoption of sim-to-real pipelines in chemistry and materials research.

Abstract

Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.

Paper Structure

This paper contains 34 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The LabUtopia simulation environment and benchmark for developing scientific embodied agents in automated laboratories. LabUtopia supports chemical reaction modeling and provides diverse laboratory assets, forming a high-fidelity testbed for tasks of varying difficulty—from atomic actions to long-horizon action sequences involving both manipulation and navigation.
  • Figure 2: An overview of our laboratory simulation suite. LabScene automatically synthesizes scalable laboratory scenes using a diverse asset library and a procedural generation pipeline, while LabSim supports the simulation of high-fidelity physical and chemical interactions.
  • Figure 3: (a) Workspace for manipulation tasks. (b) Illustration of the navigation environment. (c) Bird’s-eye view of the navigation map. (d) Occupancy grid map used for navigation. (f) Occupancy grid map with the planned path highlighted. '
  • Figure 4: An overview of our hierarchical benchmark. LabBench structures scientific tasks across five levels, from atomic manipulations to long-horizon experiments, enabling rigorous evaluation of embodied agents in realistic laboratory settings.