ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI
Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Viswesh Nagaswamy Rajesh, Yong Woo Choi, Yen-Ru Chen, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, Hao Su
TL;DR
ManiSkill3 addresses the data- and compute-hungry challenge of generalizable robotic manipulation by delivering a GPU-accelerated, highly scalable simulator with heterogeneous environments, an intuitive API, VR teleoperation, and robust sim2real/real2sim capabilities. It demonstrates unprecedented GPU-accelerated throughput (up to 30k+ FPS) and dramatically lower memory footprints, enabling large-scale visual RL and offline/online imitation learning with diverse task categories and robots. The framework provides comprehensive baselines (PPO, TD-MPC2, BC, diffusion policies, PerAct, VLA models) and demonstration pipelines, plus digital-twin oriented evaluation to bridge simulation and real-world performance. Overall, ManiSkill3 lowers barriers to scaling embodied AI research, supports rapid surrogates for real-world transfer, and invites community contributions through its open-source design and extensive documentation.
Abstract
Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms.
