Critiques of World Models

Eric Xing; Mingkai Deng; Jinyu Hou; Zhiting Hu

Critiques of World Models

Eric Xing, Mingkai Deng, Jinyu Hou, Zhiting Hu

TL;DR

This work reframes world models as general-purpose simulators for actionable futures, arguing that current approaches overemphasize video generation at the expense of goal-directed reasoning. It critically analyzes five design dimensions—data, representation, architecture, objective, and usage—and presents theoretical and empirical critiques of leading WM approaches, notably JEPA, latent representations, and MPC/RL usage. Building on these critiques, it proposes the PAN architecture, a mixed discrete-continuous, hierarchical, multimodal WM with an enhanced LLM backbone and a diffusion-based latent predictor, trained with observation-grounded generative losses to enable long-horizon planning and agentic reasoning. PAN aims to enable efficient, flexible, and scalable imagined experience to train and inform autonomous agents, with mountaineering and other complex tasks serving as motivating demonstrations for future, broader generalization.

Abstract

World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual agents with artificial (general) intelligence. There has been much debate on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of "hypothetical thinking" in psychology literature, we offer critiques of several schools of thoughts on world modeling, and argue the primary goal of a world model to be simulating all actionable possibilities of the real world for purposeful reasoning and acting. Building on the critiques, we propose a new architecture for a general-purpose world model, based on hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervision learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

Critiques of World Models

TL;DR

Abstract

Critiques of World Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (12)