Table of Contents
Fetching ...

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, Ting Wang

TL;DR

Leveraging AgentLAB, representative LLM agents are evaluated and it is found that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats.

Abstract

LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats. We anticipate that AgentLAB will serve as a valuable benchmark for tracking progress on securing LLM agents in practical settings. The benchmark is publicly available at https://tanqiujiang.github.io/AgentLAB_main.

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

TL;DR

Leveraging AgentLAB, representative LLM agents are evaluated and it is found that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats.

Abstract

LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats. We anticipate that AgentLAB will serve as a valuable benchmark for tracking progress on securing LLM agents in practical settings. The benchmark is publicly available at https://tanqiujiang.github.io/AgentLAB_main.
Paper Structure (32 sections, 1 equation, 8 figures, 4 tables, 2 algorithms)

This paper contains 32 sections, 1 equation, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Overall framework of AgentLAB.
  • Figure 2: A multi-agent framework for long-horizon attacks.
  • Figure 3: Task Injection. Coordinated injections hijack a benign task into unauthorized Slack commands.
  • Figure 4: Memory Poisoning Attack. Hidden injections in routine content (emails, code, products) are stored as "user preferences." When a harmful request arrives, retrieved memories provide false context that disables safety filtering, causing sensitive data leakage.
  • Figure 5: Distribution of tasks cross different risk categories.
  • ...and 3 more figures