Table of Contents
Fetching ...

TrustAgent: Towards Safe and Trustworthy LLM-based Agents

Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, Yongfeng Zhang

TL;DR

TrustAgent formalizes an Agent Constitution to constrain LLM-based agents and implements a tri-stage safety pipeline (pre-, in-, post-planning) to improve safety during planning and execution. It combines regulation learning, dynamic regulation retrieval, and a post-hoc safety inspector within a sandboxed planning framework to enhance both safety and helpfulness across multiple domains. Experimental results across five domains and several backbone LLMs show meaningful gains in safety, better action sequencing, and reliance on inherent reasoning capabilities for adhering to the Constitution. The work lays groundwork for safer, more trustworthy LLM-based agents and provides code and data resources for future development.

Abstract

The rise of LLM-based agents shows great potential to revolutionize task planning, capturing significant attention. Given that these agents will be integrated into high-stake domains, ensuring their reliability and safety is crucial. This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a particular focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection. Our experimental results demonstrate that the proposed framework can effectively enhance an LLM agent's safety across multiple domains by identifying and mitigating potential dangers during the planning. Further analysis reveals that the framework not only improves safety but also enhances the helpfulness of the agent. Additionally, we highlight the importance of the LLM reasoning ability in adhering to the Constitution. This paper sheds light on how to ensure the safe integration of LLM-based agents into human-centric environments. Data and code are available at https://github.com/agiresearch/TrustAgent.

TrustAgent: Towards Safe and Trustworthy LLM-based Agents

TL;DR

TrustAgent formalizes an Agent Constitution to constrain LLM-based agents and implements a tri-stage safety pipeline (pre-, in-, post-planning) to improve safety during planning and execution. It combines regulation learning, dynamic regulation retrieval, and a post-hoc safety inspector within a sandboxed planning framework to enhance both safety and helpfulness across multiple domains. Experimental results across five domains and several backbone LLMs show meaningful gains in safety, better action sequencing, and reliance on inherent reasoning capabilities for adhering to the Constitution. The work lays groundwork for safer, more trustworthy LLM-based agents and provides code and data resources for future development.

Abstract

The rise of LLM-based agents shows great potential to revolutionize task planning, capturing significant attention. Given that these agents will be integrated into high-stake domains, ensuring their reliability and safety is crucial. This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a particular focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection. Our experimental results demonstrate that the proposed framework can effectively enhance an LLM agent's safety across multiple domains by identifying and mitigating potential dangers during the planning. Further analysis reveals that the framework not only improves safety but also enhances the helpfulness of the agent. Additionally, we highlight the importance of the LLM reasoning ability in adhering to the Constitution. This paper sheds light on how to ensure the safe integration of LLM-based agents into human-centric environments. Data and code are available at https://github.com/agiresearch/TrustAgent.
Paper Structure (24 sections, 2 equations, 3 figures, 5 tables)

This paper contains 24 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Key Considerations in the development of Agent Constitution. The sub-figure of Constitution Implementation refers to Figure.\ref{['fig:post']}.
  • Figure 2: Pipeline: Process Diagram for TrustAgent:It starts with an Agent Constitution, based on which we introduce three safety strategies. When a dashed line connects entity A to entity B, it signifies that A influences the formation or operation of B, though B can still function without the influence of A. When a solid line connects entity A to entity B, it signifies that B either relies on A for its operation or A directly generates B.
  • Figure 3: Post-planning Inspection: a safety inspector inspects the generated action against the safety regulations and prompts the agent to revise the action if the plan is found to be unsafe.