Towards Generalist Robots: A Promising Paradigm via Generative Simulation
Zhou Xian, Theophile Gervet, Zhenjia Xu, Yi-Ling Qiao, Tsun-Hsuan Wang, Yian Wang
TL;DR
This paper argues that achieving generalist robots requires scalable, diverse low-level skill learning in the physical world. It proposes generative simulation, a pipeline that uses multimodal foundation models to automatically generate tasks, scenes, and training supervisions in simulation, with high-level planning aided by the same models and low-level policies learned via RL or trajectory optimization. By replacing costly real-world data collection with automated, simulated data generation grounded in real-world distributions, the approach aims to scale policy learning and move toward a unified generalist robot. While promising, it acknowledges limitations in simulator realism, potential model gaps for complex tasks, and the ongoing challenge of sim-to-real transfer and human demonstrations.
Abstract
This document serves as a position paper that outlines the authors' vision for a potential pathway towards generalist robots. The purpose of this document is to share the excitement of the authors with the community and highlight a promising research direction in robotics and AI. The authors believe the proposed paradigm is a feasible path towards accomplishing the long-standing goal of robotics research: deploying robots, or embodied AI agents more broadly, in various non-factory real-world settings to perform diverse tasks. This document presents a specific idea for mining knowledge in the latest large-scale foundation models for robotics research. Instead of directly using or adapting these models to produce low-level policies and actions, it advocates for a fully automated generative pipeline (termed as generative simulation), which uses these models to generate diversified tasks, scenes and training supervisions at scale, thereby scaling up low-level skill learning and ultimately leading to a foundation model for robotics that empowers generalist robots. The authors are actively pursuing this direction, but in the meantime, they recognize that the ambitious goal of building generalist robots with large-scale policy training demands significant resources such as computing power and hardware, and research groups in academia alone may face severe resource constraints in implementing the entire vision. Therefore, the authors believe sharing their thoughts at this early stage could foster discussions, attract interest towards the proposed pathway and related topics from industry groups, and potentially spur significant technical advancements in the field.
