SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs
Anbang Ye, Qianran Ma, Jia Chen, Muqi Li, Tong Li, Fujiao Liu, Siqi Mai, Meichen Lu, Haitao Bao, Yang You
TL;DR
The SOP-Agent paper tackles the limited long-horizon planning and domain-specific knowledge utilization of general-purpose AI agents by introducing the SOP-agent, a framework built on pseudocode-style Standard Operating Procedures written in natural language. SOPs are represented as decision graphs that the agent traverses to complete task sequences, enabling domain-specific guidance and human expertise integration. Across diverse tasks—decision-making, search and reasoning, code generation, data cleaning, and grounded customer service—the SOP-agent demonstrates versatility and achieves performance comparable to domain-specific systems and superior results versus general-purpose agents. A Grounded Customer Service Benchmark is also proposed to evaluate grounded decision-making capabilities of SOP-guided agents in customer service contexts. Overall, the approach offers a practical pathway to rapidly construct capable, domain-aware agents by leveraging human-curated SOPs and decision-graph guidance, with broad implications for real-world AI deployment.
Abstract
Despite significant advancements in general-purpose AI agents, several challenges still hinder their practical application in real-world scenarios. First, the limited planning capabilities of Large Language Models (LLM) restrict AI agents from effectively solving complex tasks that require long-horizon planning. Second, general-purpose AI agents struggle to efficiently utilize domain-specific knowledge and human expertise. In this paper, we introduce the Standard Operational Procedure-guided Agent (SOP-agent), a novel framework for constructing domain-specific agents through pseudocode-style Standard Operational Procedures (SOPs) written in natural language. Formally, we represent a SOP as a decision graph, which is traversed to guide the agent in completing tasks specified by the SOP. We conduct extensive experiments across tasks in multiple domains, including decision-making, search and reasoning, code generation, data cleaning, and grounded customer service. The SOP-agent demonstrates excellent versatility, achieving performance superior to general-purpose agent frameworks and comparable to domain-specific agent systems. Additionally, we introduce the Grounded Customer Service Benchmark, the first benchmark designed to evaluate the grounded decision-making capabilities of AI agents in customer service scenarios based on SOPs.
