Table of Contents
Fetching ...

DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su

TL;DR

DemonAgent introduces a stealthy backdoor attack against LLM-based agents by dynamically encrypting backdoor content and distributing it across multiple sub-backdoor fragments via a tiered implantation strategy. The Dynamically Encryption Mechanism DEM hides backdoor code in time-evolving content, while Multi-Backdoor Tiered Implantation MBTI decomposes and securely disseminates fragments through tool workflows, activated only by cumulative triggering. Across diverse models, benchmarks, and domains, DEM and MBTI achieve near-100% attack success with zero detection while preserving task performance, exposing significant safety gaps in current audits. The authors also provide AgentBackdoorEval, a real-world domain dataset for evaluating agent backdoor robustness, underscoring the urgent need for robust defenses in LLM-based agent ecosystems. The work highlights both the feasibility of advanced backdoor evasion and the critical defense challenges facing next-generation autonomous agents.

Abstract

As LLM-based agents become increasingly prevalent, backdoors can be implanted into agents through user queries or environment feedback, raising critical concerns regarding safety vulnerabilities. However, backdoor attacks are typically detectable by safety audits that analyze the reasoning process of agents. To this end, we propose a novel backdoor implantation strategy called \textbf{Dynamically Encrypted Multi-Backdoor Implantation Attack}. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits. To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly. Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks. Experimental results across multiple datasets demonstrate that our method achieves an attack success rate nearing 100\% while maintaining a detection rate of 0\%, illustrating its effectiveness in evading safety audits. Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats. Code and data are available at https://github.com/whfeLingYu/DemonAgent.

DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

TL;DR

DemonAgent introduces a stealthy backdoor attack against LLM-based agents by dynamically encrypting backdoor content and distributing it across multiple sub-backdoor fragments via a tiered implantation strategy. The Dynamically Encryption Mechanism DEM hides backdoor code in time-evolving content, while Multi-Backdoor Tiered Implantation MBTI decomposes and securely disseminates fragments through tool workflows, activated only by cumulative triggering. Across diverse models, benchmarks, and domains, DEM and MBTI achieve near-100% attack success with zero detection while preserving task performance, exposing significant safety gaps in current audits. The authors also provide AgentBackdoorEval, a real-world domain dataset for evaluating agent backdoor robustness, underscoring the urgent need for robust defenses in LLM-based agent ecosystems. The work highlights both the feasibility of advanced backdoor evasion and the critical defense challenges facing next-generation autonomous agents.

Abstract

As LLM-based agents become increasingly prevalent, backdoors can be implanted into agents through user queries or environment feedback, raising critical concerns regarding safety vulnerabilities. However, backdoor attacks are typically detectable by safety audits that analyze the reasoning process of agents. To this end, we propose a novel backdoor implantation strategy called \textbf{Dynamically Encrypted Multi-Backdoor Implantation Attack}. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits. To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly. Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks. Experimental results across multiple datasets demonstrate that our method achieves an attack success rate nearing 100\% while maintaining a detection rate of 0\%, illustrating its effectiveness in evading safety audits. Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats. Code and data are available at https://github.com/whfeLingYu/DemonAgent.

Paper Structure

This paper contains 45 sections, 19 equations, 13 figures, 7 tables, 4 algorithms.

Figures (13)

  • Figure 1: Overview of our method. Step 1: Decompose the backdoor code into sub-backdoors and poison the tools. Step 2: The attacker inputs designed queries, which cause the agent to execute the task by sequentially calling the tools. Step 3: Encrypted backdoor fragments are tiered and implanted through the agent's workflow. Step 4: The backdoor code is executed via cumulative triggering.
  • Figure 2: Comparison of different attack methods based on DR and ASR.
  • Figure 3: The ration of Harmless path distribution.
  • Figure 4: Impact of encoding methods on DR performance. Left: DEM; Right: DEM + MBTI.
  • Figure 5: Impact of the number of fragments on Harmless Path Ratio.
  • ...and 8 more figures