LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Rui Li, Zixuan Hu, Wenxi Qu, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, Wanli Ouyang, Lei Bai, Wangmeng Zuo, Ling-Yu Duan, Dongzhan Zhou, Shixiang Tang
TL;DR
LabUtopia delivers a lab-specialized simulation and benchmark suite by fusing LabSim (chemical-aware physics), LabScene (procedural lab environments), and LabBench (five-level hierarchical tasks). The platform enables large-scale training and principled evaluation of embodied agents in scientific tasks, and experiments reveal persistent generalization and long-horizon planning challenges for current imitation-learning methods. By providing diverse assets, a reaction-aware physics engine, and structured evaluation, LabUtopia offers a rigorous testbed for perception-planning-control integration in automated laboratories. The work highlights the sim-to-real gap and the need for more diverse embodied architectures to achieve robust scientific automation. It thus sets the stage for targeted advances in generalizable, reasoning-capable lab agents and broader adoption of sim-to-real pipelines in chemistry and materials research.
Abstract
Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.
