Table of Contents
Fetching ...

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu

TL;DR

SecureWebArena addresses the lack of holistic security benchmarks for LVLM-based web agents by introducing a unified, multi-environment testbed and a three-layer evaluation protocol that probes internal reasoning, behavior, and outcomes under six attack vectors. The framework combines six realistic web environments, 2,970 trajectories, and 330 adversarial tasks to reveal vulnerabilities across user- and environment-level threats. Experiments on nine LVLMs show universal susceptibility to visually grounded attacks and trade-offs between model specialization and security, highlighting the need for robust defenses. The benchmark provides a diagnostic tool and a foundation for safer deployment of autonomous web agents.

Abstract

Large vision-language model (LVLM)-based web agents are emerging as powerful tools for automating complex online tasks. However, when deployed in real-world environments, they face serious security risks, motivating the design of security evaluation benchmarks. Existing benchmarks provide only partial coverage, typically restricted to narrow scenarios such as user-level prompt manipulation, and thus fail to capture the broad range of agent vulnerabilities. To address this gap, we present \tool{}, the first holistic benchmark for evaluating the security of LVLM-based web agents. \tool{} first introduces a unified evaluation suite comprising six simulated but realistic web environments (\eg, e-commerce platforms, community forums) and includes 2,970 high-quality trajectories spanning diverse tasks and attack settings. The suite defines a structured taxonomy of six attack vectors spanning both user-level and environment-level manipulations. In addition, we introduce a multi-layered evaluation protocol that analyzes agent failures across three critical dimensions: internal reasoning, behavioral trajectory, and task outcome, facilitating a fine-grained risk analysis that goes far beyond simple success metrics. Using this benchmark, we conduct large-scale experiments on 9 representative LVLMs, which fall into three categories: general-purpose, agent-specialized, and GUI-grounded. Our results show that all tested agents are consistently vulnerable to subtle adversarial manipulations and reveal critical trade-offs between model specialization and security. By providing (1) a comprehensive benchmark suite with diverse environments and a multi-layered evaluation pipeline, and (2) empirical insights into the security challenges of modern LVLM-based web agents, \tool{} establishes a foundation for advancing trustworthy web agent deployment.

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

TL;DR

SecureWebArena addresses the lack of holistic security benchmarks for LVLM-based web agents by introducing a unified, multi-environment testbed and a three-layer evaluation protocol that probes internal reasoning, behavior, and outcomes under six attack vectors. The framework combines six realistic web environments, 2,970 trajectories, and 330 adversarial tasks to reveal vulnerabilities across user- and environment-level threats. Experiments on nine LVLMs show universal susceptibility to visually grounded attacks and trade-offs between model specialization and security, highlighting the need for robust defenses. The benchmark provides a diagnostic tool and a foundation for safer deployment of autonomous web agents.

Abstract

Large vision-language model (LVLM)-based web agents are emerging as powerful tools for automating complex online tasks. However, when deployed in real-world environments, they face serious security risks, motivating the design of security evaluation benchmarks. Existing benchmarks provide only partial coverage, typically restricted to narrow scenarios such as user-level prompt manipulation, and thus fail to capture the broad range of agent vulnerabilities. To address this gap, we present \tool{}, the first holistic benchmark for evaluating the security of LVLM-based web agents. \tool{} first introduces a unified evaluation suite comprising six simulated but realistic web environments (\eg, e-commerce platforms, community forums) and includes 2,970 high-quality trajectories spanning diverse tasks and attack settings. The suite defines a structured taxonomy of six attack vectors spanning both user-level and environment-level manipulations. In addition, we introduce a multi-layered evaluation protocol that analyzes agent failures across three critical dimensions: internal reasoning, behavioral trajectory, and task outcome, facilitating a fine-grained risk analysis that goes far beyond simple success metrics. Using this benchmark, we conduct large-scale experiments on 9 representative LVLMs, which fall into three categories: general-purpose, agent-specialized, and GUI-grounded. Our results show that all tested agents are consistently vulnerable to subtle adversarial manipulations and reveal critical trade-offs between model specialization and security. By providing (1) a comprehensive benchmark suite with diverse environments and a multi-layered evaluation pipeline, and (2) empirical insights into the security challenges of modern LVLM-based web agents, \tool{} establishes a foundation for advancing trustworthy web agent deployment.

Paper Structure

This paper contains 36 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overall illustration of our SecureWebArena, the first holistic benchmark for evaluating the security of LVLM-based web agents.
  • Figure 2: SecureWebArena framework. It integrates simulated environments, diverse attack vectors, and multi-level evaluation to assess agent safety performance to adversarial manipulation.
  • Figure 3: Overall comparison of agents’ vulnerability scores (RVR, BCR, and PDR) across 6 attack vectors.
  • Figure 4: Comparison of vulnerability scores (RVR, BCR, and PDR) of representative LVLM-based agents across 6 attack vectors.
  • Figure 5: Examples of evaluated environments.
  • ...and 2 more figures