PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang
TL;DR
PC-Agent tackles the challenge of automating complex PC productivity tasks by introducing an Active Perception Module for fine-grained on-screen sensing and a hierarchical multi-agent framework that decomposes decision-making into Instruction-, Subtask-, and Action-level steps, augmented by a Reflection Agent for dynamic error handling. The combination of precise perception (A11y-based and text extraction with OCR) and structured collaboration (Manager, Progress, Decision agents) enables robust long-horizon task execution across multi-app workflows. Empirical evaluation on the PC-Eval benchmark shows substantial gains over prior methods, with PC-Agent achieving a Subtask SR of 76.0% and an overall Success Rate of 56.0%, outperforming UFO and Agent-S by 44% and 32% in SR, respectively. These results demonstrate improved reliability and practicality of GUI-driven PC task automation and establish PC-Eval as a realistic benchmark for future GUI agents.
Abstract
In the field of MLLM-based GUI agents, compared to smartphones, the PC scenario not only features a more complex interactive environment, but also involves more intricate intra- and inter-app workflows. To address these issues, we propose a hierarchical agent framework named PC-Agent. Specifically, from the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content. From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels. Within this architecture, three agents (i.e., Manager, Progress and Decision) are set up for instruction decomposition, progress tracking and step-by-step decision-making respectively. Additionally, a Reflection agent is adopted to enable timely bottom-up error feedback and adjustment. We also introduce a new benchmark PC-Eval with 25 real-world complex instructions. Empirical results on PC-Eval show that our PC-Agent achieves a 32% absolute improvement of task success rate over previous state-of-the-art methods. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/PC-Agent.
