Table of Contents
Fetching ...

SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents

Mingkai Deng, Jinyu Hou, Zhiting Hu, Eric Xing

TL;DR

SimuRA introduces a world-model–driven simulative reasoning architecture that augments autoregressive LLM-based agents with explicit planning via a learned internal world model and a discrete, language-based latent state space. By separating perception, planning, and action and using high-level simulated actions, SimuRA achieves robust, long-horizon decision making across diverse web-browsing tasks, outperforming baselines by up to 124% in task completion and delivering a 32.2% success rate on FlightQA. The approach is demonstrated through ReasonerAgent-Web, an open-source demo, and is evaluated across complex website navigation, multi-hop QA, and general web automation benchmarks. The findings highlight the practical value of simulative reasoning for general-purpose agents and point to future work in efficiency, multimodal integration, and broader environmental applicability.

Abstract

AI agents built on foundation models hold enormous promise. Current practice, however, focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also faces practical limitations from black-box autoregressive reasoning, where decisions unfold token by token without explicit simulation or counterfactual evaluation of outcomes. Humans, on the other hand, reason and plan by mentally simulating the consequences of actions within an internal model of the world -- a capability that supports flexible, goal-directed behavior across diverse contexts. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of an optimal agent in any general environment, SimuRA addresses the limitations of black-box autoregressive reasoning by incorporating the world model for planning via simulation. Our prototype world model is implemented using LLMs as a substrate, leveraging the natural language as a discrete, hierarchical representation grounded in concepts for planning, while remaining model-agnostic. On complex web-browsing tasks such as flight search, SimuRA improves the success rate from 0% to 32.2% compared to a representative open-web agent baseline. Across tasks, world-model-based planning achieves up to 124% higher task completion rates than a matched black-box autoregressive baseline, demonstrating the advantages of simulative reasoning. We release ReasonerAgent-Web, a web-browsing agent built on SimuRA, as an open-source research demo.

SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents

TL;DR

SimuRA introduces a world-model–driven simulative reasoning architecture that augments autoregressive LLM-based agents with explicit planning via a learned internal world model and a discrete, language-based latent state space. By separating perception, planning, and action and using high-level simulated actions, SimuRA achieves robust, long-horizon decision making across diverse web-browsing tasks, outperforming baselines by up to 124% in task completion and delivering a 32.2% success rate on FlightQA. The approach is demonstrated through ReasonerAgent-Web, an open-source demo, and is evaluated across complex website navigation, multi-hop QA, and general web automation benchmarks. The findings highlight the practical value of simulative reasoning for general-purpose agents and point to future work in efficiency, multimodal integration, and broader environmental applicability.

Abstract

AI agents built on foundation models hold enormous promise. Current practice, however, focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also faces practical limitations from black-box autoregressive reasoning, where decisions unfold token by token without explicit simulation or counterfactual evaluation of outcomes. Humans, on the other hand, reason and plan by mentally simulating the consequences of actions within an internal model of the world -- a capability that supports flexible, goal-directed behavior across diverse contexts. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of an optimal agent in any general environment, SimuRA addresses the limitations of black-box autoregressive reasoning by incorporating the world model for planning via simulation. Our prototype world model is implemented using LLMs as a substrate, leveraging the natural language as a discrete, hierarchical representation grounded in concepts for planning, while remaining model-agnostic. On complex web-browsing tasks such as flight search, SimuRA improves the success rate from 0% to 32.2% compared to a representative open-web agent baseline. Across tasks, world-model-based planning achieves up to 124% higher task completion rates than a matched black-box autoregressive baseline, demonstrating the advantages of simulative reasoning. We release ReasonerAgent-Web, a web-browsing agent built on SimuRA, as an open-source research demo.

Paper Structure

This paper contains 36 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Demo of tasks performed using a web-browsing agent built on SimuRA with simulative planning using a world model.
  • Figure 2: A possible definition of an optimal agent
  • Figure 3: An agent in real world where groundtruth world state and universe are unavailable to experience or experiment, so world model is crucial for simulation. As discussed in §\ref{['subsec:agent-design']}, separation of simulated actions $a_t'$ for planning and concrete actions $a_t$ for execution facilitates transfer and hierarchical planning, leading to more diverse and grounded actions which lead to better task success.
  • Figure 4: Illustration of the design of simulative reasoning agent with concept-based latent states and hierarchical planning.
  • Figure 5: LLM-based implementation of SimuRA for web-related tasks (e.g. multi-website QA, flight search, etc). Planner is where we implement our proposed world-model-based planning. We also implement a baseline that simply samples the plan from a language model (i.e., autoregressive planning).
  • ...and 4 more figures