Table of Contents
Fetching ...

World-Model-Augmented Web Agents with Action Correction

Zhouzhou Shen, Xueyu Hu, Xiyun Li, Tianqing Fang, Juncheng Li, Shengyu Zhang

TL;DR

World-Model–Augmented Web Agents with Action Correction (WAC) tackles two core problems in web automation: cognitive isolation from single-model reasoning and lack of explicit pre-execution risk checks. It introduces a two-stage framework where a world model acts as an environment expert to guide action generation on demand, and a world-model-centered deduction chain simulates outcomes and provides feedback-driven refinements before any action execution. Empirical results on VisualWebArena and Online-Mind2Web show WAC delivering consistent improvements over ReAct and WebDreamer, including cross-model generalization; ablations confirm the distinct value of collaborative generation and feedback-based refinement. The approach advances robust, risk-aware web automation by tightly integrating environment dynamics into action planning and pre-execution validation, with open-source code planned to foster broader adoption and further research.

Abstract

Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to enhance candidate action proposal. To achieve risk-aware resilient task execution, we introduce a two-stage deduction chain. A world model, specialized in environmental state transitions, simulates action outcomes, which a judge model then scrutinizes to trigger action corrective feedback when necessary. Experiments show that WAC achieves absolute gains of 1.8% on VisualWebArena and 1.3% on Online-Mind2Web.

World-Model-Augmented Web Agents with Action Correction

TL;DR

World-Model–Augmented Web Agents with Action Correction (WAC) tackles two core problems in web automation: cognitive isolation from single-model reasoning and lack of explicit pre-execution risk checks. It introduces a two-stage framework where a world model acts as an environment expert to guide action generation on demand, and a world-model-centered deduction chain simulates outcomes and provides feedback-driven refinements before any action execution. Empirical results on VisualWebArena and Online-Mind2Web show WAC delivering consistent improvements over ReAct and WebDreamer, including cross-model generalization; ablations confirm the distinct value of collaborative generation and feedback-based refinement. The approach advances robust, risk-aware web automation by tightly integrating environment dynamics into action planning and pre-execution validation, with open-source code planned to foster broader adoption and further research.

Abstract

Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to enhance candidate action proposal. To achieve risk-aware resilient task execution, we introduce a two-stage deduction chain. A world model, specialized in environmental state transitions, simulates action outcomes, which a judge model then scrutinizes to trigger action corrective feedback when necessary. Experiments show that WAC achieves absolute gains of 1.8% on VisualWebArena and 1.3% on Online-Mind2Web.
Paper Structure (24 sections, 4 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 4 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Human and web agent task success rates across representative web benchmarks. Despite recent progress, state-of-the-art web agents still substantially underperform humans.
  • Figure 2: Overview of WAC. Given the current observation and task, the agent first decides whether world-model assistance is needed for action generation. Candidate actions are then proposed and lightly filtered. Each candidate is simulated by a world model to predict potential state changes, which are evaluated by a judge model to assign confidence scores. If no action exceeds a predefined threshold, feedback derived from low-scoring simulations is used to refine action proposals in a closed loop. Once a high-confidence action is identified, it is executed in the environment, and the process repeats until termination.
  • Figure 3: Comparison of action generation and pre-execution decision processes used by ReAct, WebDreamer, and our method (WAC). While ReAct directly executes a single proposed action and WebDreamer selects among simulated candidates, WAC enables collaborative action generation and feedback-driven action refinement prior to execution, leading to more robust action choices.
  • Figure 4: An illustrative case where feedback-driven action refinement corrects an initially proposed action at the first execution step. Although only a single candidate action is initially generated, simulated outcome evaluation identifies it as risky, triggering refinement and leading to a revised action that places the agent on a successful trajectory.