Table of Contents
Fetching ...

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Haoyu Wang, Christopher M. Poskitt, Jun Sun

TL;DR

AgentSpec introduces a runtime-enforcement domain-specific language to constrain LLM-driven agents, addressing safety, security, and ethical risks that static or post-hoc safeguards struggle to handle. By hooking into the agent decision pipeline and providing triggers, predicates, and enforcement actions, AgentSpec achieves real-time safety across code execution, embodied robotics, and autonomous driving. Empirical results show substantial safety gains—over 90% reduction of unsafe code, 100% AV law-compliance, and complete hazard elimination in embodied tasks—with millisecond-level overhead, and they demonstrate the viability of LLM-generated rules with strong precision and recall. The framework is modular, framework-agnostic, and open-sourced, offering a practical path toward auditable, scalable safety for diverse LLM agent deployments.

Abstract

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identify 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

TL;DR

AgentSpec introduces a runtime-enforcement domain-specific language to constrain LLM-driven agents, addressing safety, security, and ethical risks that static or post-hoc safeguards struggle to handle. By hooking into the agent decision pipeline and providing triggers, predicates, and enforcement actions, AgentSpec achieves real-time safety across code execution, embodied robotics, and autonomous driving. Empirical results show substantial safety gains—over 90% reduction of unsafe code, 100% AV law-compliance, and complete hazard elimination in embodied tasks—with millisecond-level overhead, and they demonstrate the viability of LLM-generated rules with strong precision and recall. The framework is modular, framework-agnostic, and open-sourced, offering a practical path toward auditable, scalable safety for diverse LLM agent deployments.

Abstract

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identify 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.

Paper Structure

This paper contains 22 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: A demonstrative example of the enforced LLM agent
  • Figure 2: Example rule for inspecting transactions
  • Figure 3: Abstract syntax of AgentSpec programs
  • Figure 4: Overall workflow of an AgentSpec-enforced LangChain agent
  • Figure 5: Rule for inspecting print content from untrusted sources
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 3.1: AgentSpec Rule
  • Definition 3.2: AgentSpec Rule Violation
  • Definition 3.3: AgentSpec Semantics