Secure and Efficient Access Control for Computer-Use Agents via Context Space
Haochen Gong, Chenxiao Li, Rui Chang, Wenbo Shen
TL;DR
The paper tackles the security risks and usability costs of LLM-based computer-use agents by proposing CSAgent, a static policy-based access control framework. It introduces an intent-aware context space and an OS-level service to enforce per-application policies across API, CLI, and GUI interfaces, avoiding costly runtime policy generation. A language-model–driven context analyzer, an optimized context manager, and a policy evolution framework enable automated, controllable policy construction and iterative refinement. Empirical evaluation across AgentBench, AgentDojo, and AndroidWorld shows CSAgent significantly boosts defense (≈99.56% of attacks blocked) with minimal overhead (≈1.99% latency) and improved GUI understanding (1.93×–4.12× more GUI elements identified), demonstrating practical protection for diverse CUAs.
Abstract
Large language model (LLM)-based computer-use agents represent a convergence of AI and OS capabilities, enabling natural language to control system- and application-level functions. However, due to LLMs' inherent uncertainty issues, granting agents control over computers poses significant security risks. When agent actions deviate from user intentions, they can cause irreversible consequences. Existing mitigation approaches, such as user confirmation and LLM-based dynamic action validation, still suffer from limitations in usability, security, and performance. To address these challenges, we propose CSAgent, a system-level, static policy-based access control framework for computer-use agents. To bridge the gap between static policy and dynamic context and user intent, CSAgent introduces intent- and context-aware policies, and provides an automated toolchain to assist developers in constructing and refining them. CSAgent enforces these policies through an optimized OS service, ensuring that agent actions can only be executed under specific user intents and contexts. CSAgent supports protecting agents that control computers through diverse interfaces, including API, CLI, and GUI. We implement and evaluate CSAgent, which successfully defends against more than 99.56% of attacks while introducing only 1.99% performance overhead.
