Table of Contents
Fetching ...

Go-Browse: Training Web Agents with Structured Exploration

Apurva Gandhi, Graham Neubig

TL;DR

Go-Browse is proposed, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments, and achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes.

Abstract

One of the fundamental problems in digital agents is their lack of understanding of their environment. For instance, a web browsing agent may get lost in unfamiliar websites, uncertain what pages must be visited to achieve its goals. To address this, we propose Go-Browse, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments. Go-Browse achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes. We instantiate our method on the WebArena benchmark, collecting a dataset of 10K successful task-solving trajectories and 40K interaction steps across 100 URLs. Fine-tuning a 7B parameter language model on this dataset achieves a success rate of 21.7% on the WebArena benchmark, beating GPT-4o mini by 2.4% and exceeding current state-of-the-art results for sub-10B parameter models by 2.9%.

Go-Browse: Training Web Agents with Structured Exploration

TL;DR

Go-Browse is proposed, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments, and achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes.

Abstract

One of the fundamental problems in digital agents is their lack of understanding of their environment. For instance, a web browsing agent may get lost in unfamiliar websites, uncertain what pages must be visited to achieve its goals. To address this, we propose Go-Browse, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments. Go-Browse achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes. We instantiate our method on the WebArena benchmark, collecting a dataset of 10K successful task-solving trajectories and 40K interaction steps across 100 URLs. Fine-tuning a 7B parameter language model on this dataset achieves a success rate of 21.7% on the WebArena benchmark, beating GPT-4o mini by 2.4% and exceeding current state-of-the-art results for sub-10B parameter models by 2.9%.

Paper Structure

This paper contains 22 sections, 9 figures, 9 tables, 3 algorithms.

Figures (9)

  • Figure 1: Overview of the Go-Browse algorithm for web agent data collection for a website. Go-Browse's outer-loop (left) maintains an exploration frontier of discovered but not yet fully explored webpages. Go-Browse's inner loop (right) explores each webpage in the frontier by (1) Proposing tasks for that webpage that are grounded in interaction; (2) Checking the feasibility of those tasks; and (3) Sampling trajectories and discovering new webpages by solving feasible tasks.
  • Figure 2: Interaction-first Exploration
  • Figure 3: Dataset statistics on the 5 WebArena domains (20 pages explored/domain).
  • Figure 4: Proportion of successful trajectories from each model in the Go-Browse-WA dataset.
  • Figure 5: Task diversity of the Go-Browse and NNetNav datasets. Zoom to read sub-task labels.
  • ...and 4 more figures