Table of Contents
Fetching ...

Understanding World or Predicting Future? A Comprehensive Survey of World Models

Jingtao Ding, Yunke Zhang, Yu Shang, Jie Feng, Yuheng Zhang, Zefang Zong, Yuan Yuan, Hongyuan Su, Nian Li, Jinghua Piao, Yucheng Deng, Nicholas Sukiennik, Chen Gao, Fengli Xu, Yong Li

TL;DR

This survey articulates a two-fold view of world models: constructing implicit internal representations to understand the world and forecasting its future states through video and embodied simulations. It surveys foundational progress from model-based RL and JEPA to language-augmented and diffusion-based approaches, then maps these capabilities onto diverse domains including games, robotics, autonomous driving, urban analytics, and social simulacra. Key contributions include a formal two-function categorization, a taxonomy of implicit vs predictive world models, and a synthesis of current datasets, benchmarks, and application-specific innovations. The work also outlines open problems—physics fidelity, social dimensions, benchmarks, sim-to-real transfer, efficiency, and safety—and proposes concrete directions for achieving scalable, safe, and real-world-ready world models.

Abstract

The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including generative games, autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions. We summarize the representative papers along with their code repositories in https://github.com/tsinghua-fib-lab/World-Model.

Understanding World or Predicting Future? A Comprehensive Survey of World Models

TL;DR

This survey articulates a two-fold view of world models: constructing implicit internal representations to understand the world and forecasting its future states through video and embodied simulations. It surveys foundational progress from model-based RL and JEPA to language-augmented and diffusion-based approaches, then maps these capabilities onto diverse domains including games, robotics, autonomous driving, urban analytics, and social simulacra. Key contributions include a formal two-function categorization, a taxonomy of implicit vs predictive world models, and a synthesis of current datasets, benchmarks, and application-specific innovations. The work also outlines open problems—physics fidelity, social dimensions, benchmarks, sim-to-real transfer, efficiency, and safety—and proposes concrete directions for achieving scalable, safe, and real-world-ready world models.

Abstract

The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including generative games, autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions. We summarize the representative papers along with their code repositories in https://github.com/tsinghua-fib-lab/World-Model.

Paper Structure

This paper contains 46 sections, 3 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: The roadmap of world models in deep learning era.
  • Figure 2: The overall framework of this survey. We systematically define the essential purpose of a world model as understanding the dynamics of the external world and predicting future scenarios. The timeline illustrates the development of key definitions and applications.
  • Figure 3: Two schemes of utilizing world model in decision-making.
  • Figure 4: World knowledge in large language models for world model.
  • Figure 5: Classification of world models as interactive embodied environments, including indoor, outdoor and dynamic environments. The modeling of the outside world is evolving from constructing static, current environments to predicting dynamic, future environments.
  • ...and 3 more figures