Table of Contents
Fetching ...

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents

Linxi Jiang, Zhijie Liu, Haotian Luo, Zhiqiang Lin

TL;DR

A lightweight mitigation based on pre-execution validation that monitors DOM and layout changes during planning and validates the page state immediately before action execution reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents

Abstract

Browser-use agents are widely used for everyday tasks. They enable automated interaction with web pages through structured DOM based interfaces or vision language models operating on page screenshots. However, web pages often change between planning and execution, causing agents to execute actions based on stale assumptions. We view this temporal mismatch as a time of check to time of use (TOCTOU) vulnerability in browser-use agents. Dynamic or adversarial web content can exploit this window to induce unintended actions. We present a large scale empirical study of TOCTOU vulnerabilities in browser-use agents using a benchmark that spans synthesized and real world websites. Using this benchmark, we evaluate 10 popular open source agents and show that TOCTOU vulnerabilities are widespread. We design a lightweight mitigation based on pre-execution validation. It monitors DOM and layout changes during planning and validates the page state immediately before action execution. This approach reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents.

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents

TL;DR

A lightweight mitigation based on pre-execution validation that monitors DOM and layout changes during planning and validates the page state immediately before action execution reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents

Abstract

Browser-use agents are widely used for everyday tasks. They enable automated interaction with web pages through structured DOM based interfaces or vision language models operating on page screenshots. However, web pages often change between planning and execution, causing agents to execute actions based on stale assumptions. We view this temporal mismatch as a time of check to time of use (TOCTOU) vulnerability in browser-use agents. Dynamic or adversarial web content can exploit this window to induce unintended actions. We present a large scale empirical study of TOCTOU vulnerabilities in browser-use agents using a benchmark that spans synthesized and real world websites. Using this benchmark, we evaluate 10 popular open source agents and show that TOCTOU vulnerabilities are widespread. We design a lightweight mitigation based on pre-execution validation. It monitors DOM and layout changes during planning and validates the page state immediately before action execution. This approach reduces the risk of insecure execution and mitigates unintended side effects in browser-use agents.
Paper Structure (54 sections, 3 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 54 sections, 3 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: A real-world TOCTOU example on the Forbes homepage. The green region indicates the intended target area at $t_1$. A delayed advertisement overlay (red region) appears at $t_2$ and overlaps the target, so a subsequent click at $t_3$ can become an unintended ad click that redirects to an advertisement page.
  • Figure 2: A TOCTOU window in the browser-use agent loop. The agent selects $a_{\text{plan}}$ from $o_{t_{\text{plan}}}$, but the page changes before $t_{\text{act}}$, so $a_{\text{plan}}$ may apply to a different target.
  • Figure 3: A DynWeb instance for Type I (UI changes). An adversary-controlled origin injects a delayed overlay between check time ($t_1$) and use time ($t_3$), causing the agent's click to resolve to an unintended control and redirecting it into an adversary-chosen flow.
  • Figure 4: Mitigation Framework. The agent plans actions while monitoring DOM and layout changes, and execution proceeds only if validation confirms stability.
  • Figure 5: Trigger ratio of TOCTOU vulnerabilities across three manipulation types. Here, n counts the number of cases per type, including both synthesized scenarios and real-world websites.
  • ...and 3 more figures