SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents
Mingkai Deng, Jinyu Hou, Zhiting Hu, Eric Xing
TL;DR
SimuRA introduces a world-model–driven simulative reasoning architecture that augments autoregressive LLM-based agents with explicit planning via a learned internal world model and a discrete, language-based latent state space. By separating perception, planning, and action and using high-level simulated actions, SimuRA achieves robust, long-horizon decision making across diverse web-browsing tasks, outperforming baselines by up to 124% in task completion and delivering a 32.2% success rate on FlightQA. The approach is demonstrated through ReasonerAgent-Web, an open-source demo, and is evaluated across complex website navigation, multi-hop QA, and general web automation benchmarks. The findings highlight the practical value of simulative reasoning for general-purpose agents and point to future work in efficiency, multimodal integration, and broader environmental applicability.
Abstract
AI agents built on foundation models hold enormous promise. Current practice, however, focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also faces practical limitations from black-box autoregressive reasoning, where decisions unfold token by token without explicit simulation or counterfactual evaluation of outcomes. Humans, on the other hand, reason and plan by mentally simulating the consequences of actions within an internal model of the world -- a capability that supports flexible, goal-directed behavior across diverse contexts. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of an optimal agent in any general environment, SimuRA addresses the limitations of black-box autoregressive reasoning by incorporating the world model for planning via simulation. Our prototype world model is implemented using LLMs as a substrate, leveraging the natural language as a discrete, hierarchical representation grounded in concepts for planning, while remaining model-agnostic. On complex web-browsing tasks such as flight search, SimuRA improves the success rate from 0% to 32.2% compared to a representative open-web agent baseline. Across tasks, world-model-based planning achieves up to 124% higher task completion rates than a matched black-box autoregressive baseline, demonstrating the advantages of simulative reasoning. We release ReasonerAgent-Web, a web-browsing agent built on SimuRA, as an open-source research demo.
