Table of Contents
Fetching ...

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Jiahao Liu, Bonan Ruan, Xianglin Yang, Zhiwei Lin, Yan Liu, Yang Wang, Tao Wei, Zhenkai Liang

TL;DR

TraceAegis introduces a provenance-based framework for securing LLM-based agents by learning hierarchical execution structures and generalized behavioral constraints from normal tool-invocation traces. By comparing new traces against both structural and semantic constraints, it detects anomalies without requiring access to internal prompts. The authors release TraceAegis-Bench, covering healthcare and procurement scenarios, and demonstrate that TraceAegis outperforms diverse foundation-model baselines and identifies real-world red-team attacks with strong accuracy. This approach offers a scalable, interpretable defense against subtle, long-horizon anomalies in complex agent workflows.

Abstract

LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent's execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors.

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

TL;DR

TraceAegis introduces a provenance-based framework for securing LLM-based agents by learning hierarchical execution structures and generalized behavioral constraints from normal tool-invocation traces. By comparing new traces against both structural and semantic constraints, it detects anomalies without requiring access to internal prompts. The authors release TraceAegis-Bench, covering healthcare and procurement scenarios, and demonstrate that TraceAegis outperforms diverse foundation-model baselines and identifies real-world red-team attacks with strong accuracy. This approach offers a scalable, interpretable defense against subtle, long-horizon anomalies in complex agent workflows.

Abstract

LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of attacks, such as tool poisoning and malicious instructions, that compromise their execution flow and can lead to serious consequences like data breaches and financial loss. Existing studies typically attempt to mitigate such anomalies by predefining specific rules and enforcing them at runtime to enhance safety. Yet, designing comprehensive rules is difficult, requiring extensive manual effort and still leaving gaps that result in false negatives. As agent systems evolve into complex software systems, we take inspiration from software system security and propose TraceAegis, a provenance-based analysis framework that leverages agent execution traces to detect potential anomalies. In particular, TraceAegis constructs a hierarchical structure to abstract stable execution units that characterize normal agent behaviors. These units are then summarized into constrained behavioral rules that specify the conditions necessary to complete a task. By validating execution traces against both hierarchical and behavioral constraints, TraceAegis is able to effectively detect abnormal behaviors. To evaluate the effectiveness of TraceAegis, we introduce TraceAegis-Bench, a dataset covering two representative scenarios: healthcare and corporate procurement. Each scenario includes 1,300 benign behaviors and 300 abnormal behaviors, where the anomalies either violate the agent's execution order or break the semantic consistency of its execution sequence. Experimental results demonstrate that TraceAegis achieves strong performance on TraceAegis-Bench, successfully identifying the majority of abnormal behaviors.

Paper Structure

This paper contains 25 sections, 8 equations, 4 figures, 5 tables, 2 algorithms.

Figures (4)

  • Figure 1: An MCP-based Clinic Triage Agent. Given a user input, the agent parses the symptom description and recommends appropriate medical departments. A gateway component is integrated to record interaction data, including input parameters and responses from MCP tools.
  • Figure 2: Overview of TraceAegis. (Top)TraceAegis first reconstructs the hierarchical structure. It then extracts fine-grained paths as execution units and summarizes constrained behaviors guarded with specific conditions, serving as anomaly detection proxies. (Bottom) During the detection phase, TraceAegis checks whether each execution unit has previously appeared, and then compares it against the constrained behaviors to determine whether the conditions are satisfied.
  • Figure 3: Behavior rules with/without Hierarchical Structure.
  • Figure 4: An example of attacks observed during the red-teaming process.

Theorems & Definitions (3)

  • Definition 4.1: Hierarchical Dominance
  • Definition 4.2: Intra-Level Interchangeability
  • Definition 4.3: Monotonicity