A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Xiaoxiao Long; Qingrui Zhao; Kaiwen Zhang; Zihao Zhang; Dingrui Wang; Yumeng Liu; Zhengjie Shu; Yi Lu; Shouzheng Wang; Xinzhe Wei; Wei Li; Wei Yin; Yao Yao; Jia Pan; Qiu Shen; Ruigang Yang; Xun Cao; Qionghai Dai

A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai

TL;DR

The paper proposes a five-level IR-L0 to IR-L4 framework to evaluate humanoid robot autonomy and social cognition, and surveys the complementary roles of physical simulators and world models in embodied AI. It analyzes how simulators provide safe, controllable training environments while world models offer internal predictive capabilities for planning, reward inference, and long-horizon decision making. The review covers mobility, manipulation, and human-robot interaction, compares mainstream simulators and their physics/rendering capabilities, and surveys a wide range of world-model architectures (RSSM, JEPA, transformer/diffusion) and applications (autonomous driving, articulated robots). The work highlights trends toward diffusion-based world models, multi-modal conditioning, and occupancy-based world representations, arguing that the integration of external simulation with internal modeling is key to achieving robust sim-to-real transfer and progress toward IR-L4 autonomy. It also provides a repository for up-to-date literature and emphasizes open challenges in data efficiency, generalization, causality, and evaluation in embodied AI.

Abstract

The pursuit of artificial general intelligence (AGI) has placed embodied intelligence at the forefront of robotics research. Embodied intelligence focuses on agents capable of perceiving, reasoning, and acting within the physical world. Achieving robust embodied intelligence requires not only advanced perception and control, but also the ability to ground abstract cognition in real-world interactions. Two foundational technologies, physical simulators and world models, have emerged as critical enablers in this quest. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. In contrast, world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. This survey systematically reviews recent advances in learning embodied AI through the integration of physical simulators and world models. We analyze their complementary roles in enhancing autonomy, adaptability, and generalization in intelligent robots, and discuss the interplay between external simulation and internal modeling in bridging the gap between simulated training and real-world deployment. By synthesizing current progress and identifying open challenges, this survey aims to provide a comprehensive perspective on the path toward more capable and generalizable embodied AI systems. We also maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey.

A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

TL;DR

Abstract

A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (25)