Table of Contents
Fetching ...

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

Yifan Wu, Yiran Peng, Yiyu Chen, Jianhao Ruan, Zijie Zhuang, Cheng Yang, Jiayi Zhang, Man Chen, Yenchi Tseng, Zhaoyang Yu, Liang Chen, Yuyao Zhai, Bang Liu, Chenglin Wu, Yuyu Luo

TL;DR

This work proposes AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites and observes a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.

Abstract

The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, leading to reliance on inconsistent and costly external verifiers to evaluate step-level correctness. To address this, we propose AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites. Unlike real websites, where state transitions are implicit, AutoWebWorld explicitly defines all states, actions, and transition rules. This enables programmatic verification: action correctness is checked against predefined rules, and task success is confirmed by reaching a goal state in the FSM graph. AutoWebWorld enables a fully automated search-and-verify pipeline, generating over 11,663 verified trajectories from 29 diverse web environments at only $0.04 per trajectory. Training on this synthetic data significantly boosts real-world performance. Our 7B Web GUI agent outperforms all baselines within 15 steps on WebVoyager. Furthermore, we observe a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

TL;DR

This work proposes AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites and observes a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.

Abstract

The performance of autonomous Web GUI agents heavily relies on the quality and quantity of their training data. However, a fundamental bottleneck persists: collecting interaction trajectories from real-world websites is expensive and difficult to verify. The underlying state transitions are hidden, leading to reliance on inconsistent and costly external verifiers to evaluate step-level correctness. To address this, we propose AutoWebWorld, a novel framework for synthesizing controllable and verifiable web environments by modeling them as Finite State Machines (FSMs) and use coding agents to translate FSMs into interactive websites. Unlike real websites, where state transitions are implicit, AutoWebWorld explicitly defines all states, actions, and transition rules. This enables programmatic verification: action correctness is checked against predefined rules, and task success is confirmed by reaching a goal state in the FSM graph. AutoWebWorld enables a fully automated search-and-verify pipeline, generating over 11,663 verified trajectories from 29 diverse web environments at only $0.04 per trajectory. Training on this synthetic data significantly boosts real-world performance. Our 7B Web GUI agent outperforms all baselines within 15 steps on WebVoyager. Furthermore, we observe a clear scaling law: as the synthetic data volume increases, performance on WebVoyager and Online-Mind2Web consistently improves.
Paper Structure (58 sections, 12 equations, 7 figures, 8 tables)

This paper contains 58 sections, 12 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Comparison of GUI trajectory collection between existing methods and AutoWebWorld.
  • Figure 2: The four-step generation process of AutoWebWorld. Step 1 is to generate an FSM based on a multi-agent architecture. Step 2 uses coding agents to translate the output FSM into Synthesized Web. Step 3 uses BFS to explore the FSM graph and get all the potential trajectories. Step 4 filters these BFS-generated candidates by replaying each trajectory in the synthesized website with Playwright and retaining only those that execute all steps successfully and reach the intended goal state.
  • Figure 3: A verified, multi-step GUI trajectory for creating a repository in a synthesized GitHub environment. The action sequence is automatically generated by traversing the environment's underlying Finite State Machine (FSM) and programmatically verified through execution, demonstrating the end-to-end process described in Section \ref{['sec:autowebworld']}.
  • Figure 4: Scaling Curve on WebVoyager and Online-Mind2Web.
  • Figure 5: Ablation Study of Grounding Training Data on Qwen2.5-VL-3B.
  • ...and 2 more figures