Safe and Scalable Web Agent Learning via Recreated Websites

Hyungjoo Chae; Jungsoo Park; Alan Ritter

Safe and Scalable Web Agent Learning via Recreated Websites

Hyungjoo Chae, Jungsoo Park, Alan Ritter

TL;DR

Through experiments on web agent benchmarks, it is shown that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments.

Abstract

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.

Safe and Scalable Web Agent Learning via Recreated Websites

TL;DR

Abstract

Paper Structure (46 sections, 29 figures, 6 tables)

This paper contains 46 sections, 29 figures, 6 tables.

Introduction
Related Work
Agent learning with verifiable reward.
Self-evolving agents.
Coding agents for web development.
Method
Recreating Real-World Websites
Verifiable Task and Judge Generation
Self-Evolving Agent Learning in Verifiable Environments
Environment Statistics and Human Evaluation
Experiments
Generalization Across Websites
Implementation details.
Benchmarks and baselines.
Result.
...and 31 more sections

Figures (29)

Figure 1: Comparison between the traditional self-evolution paradigm and our verifiable environment framework. (a) In traditional settings, agents interact directly with real-world environments and rely on unvalidated synthetic tasks and non-verifiable, LLM-based reward signals, leading to unsafe exploration and unreliable learning. (b) In contrast, VeriEnv clones real-world websites into synthetic environments with full internal access, enabling safe exploration, validated task generation, and deterministic, verifiable reward signals for stable and scalable agent learning.
Figure 2: Overview of VeriEnv. VeriEnv first clones a real website into a fully instrumented synthetic environment (code $C$, database $D$, and a Python SDK $P$) via coding agent, then uses task and judge generators to produce tasks at varying difficulty and verify both tasks and judges by interacting with the website and database through the SDK, yielding deterministic, verified rewards for agent learning.
Figure 3: Example of a verifiable task with executable validation in a synthetic recipe website ( i.e., cloned from apartments.com).
Figure 4: Site-specific self-evolving training within a cloned synthetic environment. Agents are trained on a fixed target website using automatically generated tasks and verifiable reward signals.
Figure 5: Analysis on the scaling effect of the number of websites.
...and 24 more figures

Safe and Scalable Web Agent Learning via Recreated Websites

TL;DR

Abstract

Safe and Scalable Web Agent Learning via Recreated Websites

Authors

TL;DR

Abstract

Table of Contents

Figures (29)