From Word to World: Can Large Language Models be Implicit Text-based World Models?
Yixia Li, Hongru Wang, Jiahao Qiu, Zhenfei Yin, Dongdong Zhang, Cheng Qian, Zeping Li, Pony Ma, Guanhua Chen, Heng Ji, Mengdi Wang
TL;DR
This work investigates whether large language models can serve as implicit text-based world simulators to enhance agent learning from interaction. By formalizing world modeling as multi-turn next-state prediction and evaluating across five diverse text environments, the study demonstrates that sufficiently trained LLMs can maintain coherent latent dynamics, scale with data and capacity, and improve downstream learning via verification, synthetic data, and warm-started RL. However, gains depend on behavioral coverage and environment complexity, limiting effectiveness in open-ended settings without grounding in real observations. The results establish a foundation for treating LLMs as general-purpose simulators of interactive worlds and suggest directions toward multimodal extensions beyond text.
Abstract
Agentic reinforcement learning increasingly relies on experience-driven scaling, yet real-world environments remain non-adaptive, limited in coverage, and difficult to scale. World models offer a potential way to improve learning efficiency through simulated experience, but it remains unclear whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents. We study these questions in text-based environments, which provide a controlled setting to reinterpret language modeling as next-state prediction under interaction. We introduce a three-level framework for evaluating LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility. Across five representative environments, we find that sufficiently trained world models maintain coherent latent state, scale predictably with data and model size, and improve agent performance via action verification, synthetic trajectory generation, and warm-starting reinforcement learning. Meanwhile, these gains depend critically on behavioral coverage and environment complexity, delineating clear boundry on when world modeling effectively supports agent learning.
