Table of Contents
Fetching ...

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

Erfan Shayegani, Keegan Hines, Yue Dong, Nael Abu-Ghazaleh, Roman Lutz, Spencer Whitehead, Vidhisha Balachandran, Besmira Nushi, Vibhav Vineet

TL;DR

The paper identifies Blind Goal-Directedness (BGD) in Computer-Use Agents (CUAs) as a persistent bias toward pursuing user goals regardless of feasibility, safety, or context. It introduces Blind-Act, a 90-task benchmark built on OSWorld to systematically provoke BGD across three patterns—lack of contextual reasoning, assumptions under ambiguity, and contradictory or infeasible goals—and uses LLM-based judges to assess BGD and Completion across nine frontier models, finding high BGD prevalence (around 80%) even with prompting interventions. The authors provide a detailed evaluation of model behavior, judge accuracy (93.75% agreement with humans), and qualitative failure modes (execution-first bias, thought-action disconnect, and request-primacy), highlighting that prompting can reduce but not eliminate BGD. They argue for stronger training- or inference-time safeguards and propose trajectory-level monitoring and mitigation strategies to ensure safer deployment of CUAs, positioning Blind-Act as a foundation for future alignment research and practical safety enhancements.

Abstract

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

TL;DR

The paper identifies Blind Goal-Directedness (BGD) in Computer-Use Agents (CUAs) as a persistent bias toward pursuing user goals regardless of feasibility, safety, or context. It introduces Blind-Act, a 90-task benchmark built on OSWorld to systematically provoke BGD across three patterns—lack of contextual reasoning, assumptions under ambiguity, and contradictory or infeasible goals—and uses LLM-based judges to assess BGD and Completion across nine frontier models, finding high BGD prevalence (around 80%) even with prompting interventions. The authors provide a detailed evaluation of model behavior, judge accuracy (93.75% agreement with humans), and qualitative failure modes (execution-first bias, thought-action disconnect, and request-primacy), highlighting that prompting can reduce but not eliminate BGD. They argue for stronger training- or inference-time safeguards and propose trajectory-level monitoring and mitigation strategies to ensure safer deployment of CUAs, positioning Blind-Act as a foundation for future alignment research and practical safety enhancements.

Abstract

Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

Paper Structure

This paper contains 46 sections, 20 figures, 5 tables.

Figures (20)

  • Figure 1: Overview of Blind Goal-Directedness (BGD) in Computer-Use Agents (CUAs). (A) BGD examples: sending an image to a child while ignoring violent content, assuming citizenship and disability to reduce taxes, and disabling firewall to "enhance security" despite the contradiction. (B) Our benchmark, Blind-Act, includes 90 tasks across three patterns of BGD: lack of contextual reasoning, assumptions and decisions under ambiguity, and contradictory or infeasible goals, built on realistic OSWorld Ubuntu VMs. (C) Evaluating nine frontier models, we find high BGD rates (80.8%), with prompting interventions only partly reducing risk. Qualitative analysis also reveals some observed failure modes: execution-first bias, thought–action disconnect, and request-primacy.
  • Figure 2: Average Blind Goal-Directedness (BGD) and Completion on Blind-Act under the Default, Contextual, and Reflective system prompts.
  • Figure 3: Example from GPT-4.1 showing execution-first bias: the agent focuses on how to act through GUI operations while overlooking whether it should act given the harmful context.
  • Figure 4: Example from o4-mini showing thought–action disconnect under the Contextual prompt: the agent reasons that 777 is insecure and 700 is safer, but in action still sets the permission to 777.
  • Figure 5: Example from Claude Sonnet 4 showing request-primacy under the Reflective prompt: the agent notes the safety concern, then justifies proceeding because the user requested it.
  • ...and 15 more figures