Table of Contents
Fetching ...

Safe and Scalable Web Agent Learning via Recreated Websites

Hyungjoo Chae, Jungsoo Park, Alan Ritter

TL;DR

Through experiments on web agent benchmarks, it is shown that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments.

Abstract

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.

Safe and Scalable Web Agent Learning via Recreated Websites

TL;DR

Through experiments on web agent benchmarks, it is shown that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments.

Abstract

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.
Paper Structure (46 sections, 29 figures, 6 tables)

This paper contains 46 sections, 29 figures, 6 tables.

Figures (29)

  • Figure 1: Comparison between the traditional self-evolution paradigm and our verifiable environment framework. (a) In traditional settings, agents interact directly with real-world environments and rely on unvalidated synthetic tasks and non-verifiable, LLM-based reward signals, leading to unsafe exploration and unreliable learning. (b) In contrast, VeriEnv clones real-world websites into synthetic environments with full internal access, enabling safe exploration, validated task generation, and deterministic, verifiable reward signals for stable and scalable agent learning.
  • Figure 2: Overview of VeriEnv. VeriEnv first clones a real website into a fully instrumented synthetic environment (code $C$, database $D$, and a Python SDK $P$) via coding agent, then uses task and judge generators to produce tasks at varying difficulty and verify both tasks and judges by interacting with the website and database through the SDK, yielding deterministic, verified rewards for agent learning.
  • Figure 3: Example of a verifiable task with executable validation in a synthetic recipe website ( i.e., cloned from apartments.com).
  • Figure 4: Site-specific self-evolving training within a cloned synthetic environment. Agents are trained on a fixed target website using automatically generated tasks and verifiable reward signals.
  • Figure 5: Analysis on the scaling effect of the number of websites.
  • ...and 24 more figures