World-Model-Augmented Web Agents with Action Correction
Zhouzhou Shen, Xueyu Hu, Xiyun Li, Tianqing Fang, Juncheng Li, Shengyu Zhang
TL;DR
World-Model–Augmented Web Agents with Action Correction (WAC) tackles two core problems in web automation: cognitive isolation from single-model reasoning and lack of explicit pre-execution risk checks. It introduces a two-stage framework where a world model acts as an environment expert to guide action generation on demand, and a world-model-centered deduction chain simulates outcomes and provides feedback-driven refinements before any action execution. Empirical results on VisualWebArena and Online-Mind2Web show WAC delivering consistent improvements over ReAct and WebDreamer, including cross-model generalization; ablations confirm the distinct value of collaborative generation and feedback-based refinement. The approach advances robust, risk-aware web automation by tightly integrating environment dynamics into action planning and pre-execution validation, with open-source code planned to foster broader adoption and further research.
Abstract
Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might not possess comprehensive awareness of execution risks, prematurely performing risky actions that cause losses and lead to task failure. To address these challenges, we propose WAC, a web agent that integrates model collaboration, consequence simulation, and feedback-driven action refinement. To overcome the cognitive isolation of individual models, we introduce a multi-agent collaboration process that enables an action model to consult a world model as a web-environment expert for strategic guidance; the action model then grounds these suggestions into executable actions, leveraging prior knowledge of environmental state transition dynamics to enhance candidate action proposal. To achieve risk-aware resilient task execution, we introduce a two-stage deduction chain. A world model, specialized in environmental state transitions, simulates action outcomes, which a judge model then scrutinizes to trigger action corrective feedback when necessary. Experiments show that WAC achieves absolute gains of 1.8% on VisualWebArena and 1.3% on Online-Mind2Web.
