Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators
Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, Yan Lu
TL;DR
This work introduces ResponsibleTA, a multi-modal framework that empowers LLMs to act as responsible task automators by integrating feasibility prediction, completeness verification, and security protection. It compares prompt-based and domain-specific (DSFP) paradigms for feasibility and completeness, showing that domain-specific models substantially outperform LLM-only approaches in UI task automation. Through extensive datasets and real-world case studies, ResponsibleTA reduces invalid executions and improves completion success while safeguarding user privacy via edge memory. The findings suggest that grounding LLMs with domain-specific knowledge yields higher reliability in automated task execution, with practical implications for privacy-preserving, robust copilots across diverse applications.
Abstract
The recent success of Large Language Models (LLMs) signifies an impressive stride towards artificial general intelligence. They have shown a promising prospect in automatically completing tasks upon user instructions, functioning as brain-like coordinators. The associated risks will be revealed as we delegate an increasing number of tasks to machines for automated completion. A big question emerges: how can we make machines behave responsibly when helping humans automate tasks as personal copilots? In this paper, we explore this question in depth from the perspectives of feasibility, completeness and security. In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e.g., the protection of users' privacy). We further propose and compare two paradigms for implementing the first two capabilities. One is to leverage the generic knowledge of LLMs themselves via prompt engineering while the other is to adopt domain-specific learnable models. Moreover, we introduce a local memory mechanism for achieving the third capability. We evaluate our proposed ResponsibleTA on UI task automation and hope it could bring more attentions to ensuring LLMs more responsible in diverse scenarios.
