Table of Contents
Fetching ...

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Zeyi Liao, Jaylen Jones, Linxi Jiang, Yuting Ning, Eric Fosler-Lussier, Yu Su, Zhiqiang Lin, Huan Sun

TL;DR

RedTeamCUA addresses the risk of indirect prompt injection for computer-use agents by introducing a hybrid, sandboxed evaluation platform that combines a VM-based OS (OSWorld) with self-hosted web replicas (Dockerized OwnCloud, Forum, and RocketChat). Coupled with RTC-Bench, a large-scale benchmark of 864 adversarial examples, the framework enables end-to-end and decoupled adversarial testing across web and OS environments. Empirical results show substantial susceptibility of frontier CUAs (ASR up to 66.2% in Decoupled Eval and 60% in End2End) and high Attempt Rates (up to 92.5%), underscoring the need for robust defenses against indirect prompt injection even as agent capabilities improve. The work also evaluates defense approaches (system- and model-level) and finds them largely insufficient, highlighting an urgent research agenda for CUA-specific defenses and safer interaction protocols before real-world deployment.

Abstract

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection. Current evaluations of this threat either lack support realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an ASR of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning high ASRs in realistic end-to-end settings, with the strongest-to-date Claude 4.5 Sonnet | CUA exhibiting the highest ASR of 60%, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

TL;DR

RedTeamCUA addresses the risk of indirect prompt injection for computer-use agents by introducing a hybrid, sandboxed evaluation platform that combines a VM-based OS (OSWorld) with self-hosted web replicas (Dockerized OwnCloud, Forum, and RocketChat). Coupled with RTC-Bench, a large-scale benchmark of 864 adversarial examples, the framework enables end-to-end and decoupled adversarial testing across web and OS environments. Empirical results show substantial susceptibility of frontier CUAs (ASR up to 66.2% in Decoupled Eval and 60% in End2End) and high Attempt Rates (up to 92.5%), underscoring the need for robust defenses against indirect prompt injection even as agent capabilities improve. The work also evaluates defense approaches (system- and model-level) and finds them largely insufficient, highlighting an urgent research agenda for CUA-specific defenses and safer interaction protocols before real-world deployment.

Abstract

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection. Current evaluations of this threat either lack support realistic but controlled environments or ignore hybrid web-OS attack scenarios involving both interfaces. To address this, we propose RedTeamCUA, an adversarial testing framework featuring a novel hybrid sandbox that integrates a VM-based OS environment with Docker-based web platforms. Our sandbox supports key features tailored for red teaming, such as flexible adversarial scenario configuration, and a setting that decouples adversarial evaluation from navigational limitations of CUAs by initializing tests directly at the point of an adversarial injection. Using RedTeamCUA, we develop RTC-Bench, a comprehensive benchmark with 864 examples that investigate realistic, hybrid web-OS attack scenarios and fundamental security vulnerabilities. Benchmarking current frontier CUAs identifies significant vulnerabilities: Claude 3.7 Sonnet | CUA demonstrates an ASR of 42.9%, while Operator, the most secure CUA evaluated, still exhibits an ASR of 7.6%. Notably, CUAs often attempt to execute adversarial tasks with an Attempt Rate as high as 92.5%, although failing to complete them due to capability limitations. Nevertheless, we observe concerning high ASRs in realistic end-to-end settings, with the strongest-to-date Claude 4.5 Sonnet | CUA exhibiting the highest ASR of 60%, indicating that CUA threats can already result in tangible risks to users and computer systems. Overall, RedTeamCUA provides an essential framework for advancing realistic, controlled, and systematic analysis of CUA vulnerabilities, highlighting the urgent need for robust defenses to indirect prompt injection prior to real-world deployment.

Paper Structure

This paper contains 45 sections, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Our RedTeamCUA framework features a hybrid environment sandbox, combining a VM-based OS and Docker-based web replicas, to enable controlled and systematic analysis of CUA vulnerabilities in adversarial scenarios spanning both web and OS environments. A high-resolution screenshot of the forum webpage containing the injection is shown in Figure \ref{['fig:forum']}.
  • Figure 2: ASR breakdown by web platform and CIA categories.
  • Figure 3: A 1080p screenshot showcasing a code-based injection on the RocketChat platform, aiming to compromise users' confidentiality.
  • Figure 4: A 1080p screenshot showcasing a language-based injection on the Forum platform, aiming to compromise system integrity.
  • Figure 5: A 1080p screenshot showcasing a code-based injection on the OwnCloud platform, aiming to compromise system availability.
  • ...and 4 more figures