Table of Contents
Fetching ...

Comparing Human Oversight Strategies for Computer-Use Agents

Chaoran Chen, Zhiping Zhang, Zeya Chen, Eryue Xu, Yinuo Yang, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li

Abstract

LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coordination problem defined by delegation structure and engagement level, and use this lens to compare four oversight strategies in a mixed-methods study with 48 participants in a live web environment. Our results show that oversight strategy more reliably shaped users' exposure to problematic actions than their ability to correct them once visible. Plan-based strategies were associated with lower rates of agent problematic-action occurrence, but not equally strong gains in runtime intervention success once such actions became visible. On subjective measures, no single strategy was uniformly best, and the clearest context-sensitive differences appeared in trust. Qualitative findings further suggest that intervention depended not only on what controls users retained, but on whether risky moments became legible as requiring judgment during execution. These findings suggest that effective CUA oversight is not achieved by maximizing human involvement alone. Instead, it depends on how supervision is structured to surface decision-critical moments and support their recognition in time for meaningful intervention.

Comparing Human Oversight Strategies for Computer-Use Agents

Abstract

LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coordination problem defined by delegation structure and engagement level, and use this lens to compare four oversight strategies in a mixed-methods study with 48 participants in a live web environment. Our results show that oversight strategy more reliably shaped users' exposure to problematic actions than their ability to correct them once visible. Plan-based strategies were associated with lower rates of agent problematic-action occurrence, but not equally strong gains in runtime intervention success once such actions became visible. On subjective measures, no single strategy was uniformly best, and the clearest context-sensitive differences appeared in trust. Qualitative findings further suggest that intervention depended not only on what controls users retained, but on whether risky moments became legible as requiring judgment during execution. These findings suggest that effective CUA oversight is not achieved by maximizing human involvement alone. Instead, it depends on how supervision is structured to surface decision-critical moments and support their recognition in time for meaningful intervention.

Paper Structure

This paper contains 96 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Design space of CUA oversight strategies defined by delegation structure (where default decision authority resides) and engagement level (the level of the workflow at which human oversight is primarily organized).
  • Figure 2: Interface instantiations of the four oversight strategies used in our study. (1) Risk-Gated: (a) agent focus and reasoning; (b) risk-triggered approval dialog. (2) Supervisory Co-Execution: (a) plan review; (b) hierarchical execution trace; (c) next-step approval. (3) Action Confirmation: (a) per-action confirmation dialog. (4) Structurally Enriched: (a) plan review; (b) agent focus, reasoning, and risk-triggered approval; (c) step-level execution trace with inspectable risk labels; (d) plan studio for progress review and plan revision.
  • Figure 3: Loan pre-qualification task with embedded privacy leakage risk.
  • Figure 4: Flight booking task with embedded privacy leakage risk.
  • Figure 5: Benefits application task with embedded privacy leakage risk.
  • ...and 5 more figures