Table of Contents
Fetching ...

The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover

Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro

TL;DR

LLM-powered agents enable advanced autonomous tasks but introduce systemic security risks. The authors evaluate 18 LLMs against three attack surfaces—Direct Prompt Injection, RAG Backdoor, and Inter-Agent Trust Exploitation—and demonstrate pervasive vulnerabilities, including 100% success in inter-agent attacks. They design and test synthetic malware payloads delivered via command pipes and RAG poisoning, quantified with ASR, MIR, and FSR metrics, revealing that larger models do not inherently resist agent-based exploits. The study argues for treating LLM agents as potentially compromised software and proposes defense directions such as security proxies and guarded tool invocation to mitigate complete computer takeover risks.

Abstract

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises. This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents, highlighting how they can be exploited as attack vectors capable of achieving computer takeovers. We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers. We demonstrate that adversaries can effectively coerce popular LLMs into autonomously installing and executing malware on victim machines. Our evaluation of 18 state-of-the-art LLMs reveals an alarming scenario: 94.4% of models succumb to Direct Prompt Injection, and 83.3% are vulnerable to the more stealthy and evasive RAG Backdoor Attack. Notably, we tested trust boundaries within multi-agent systems, where LLM agents interact and influence each other, and we revealed that LLMs which successfully resist direct injection or RAG backdoor attacks will execute identical payloads when requested by peer agents. We found that 100.0% of tested LLMs can be compromised through Inter-Agent Trust Exploitation attacks, and that every model exhibits context-dependent security behaviors that create exploitable blind spots.

The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover

TL;DR

LLM-powered agents enable advanced autonomous tasks but introduce systemic security risks. The authors evaluate 18 LLMs against three attack surfaces—Direct Prompt Injection, RAG Backdoor, and Inter-Agent Trust Exploitation—and demonstrate pervasive vulnerabilities, including 100% success in inter-agent attacks. They design and test synthetic malware payloads delivered via command pipes and RAG poisoning, quantified with ASR, MIR, and FSR metrics, revealing that larger models do not inherently resist agent-based exploits. The study argues for treating LLM agents as potentially compromised software and proposes defense directions such as security proxies and guarded tool invocation to mitigate complete computer takeover risks.

Abstract

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises. This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents, highlighting how they can be exploited as attack vectors capable of achieving computer takeovers. We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers. We demonstrate that adversaries can effectively coerce popular LLMs into autonomously installing and executing malware on victim machines. Our evaluation of 18 state-of-the-art LLMs reveals an alarming scenario: 94.4% of models succumb to Direct Prompt Injection, and 83.3% are vulnerable to the more stealthy and evasive RAG Backdoor Attack. Notably, we tested trust boundaries within multi-agent systems, where LLM agents interact and influence each other, and we revealed that LLMs which successfully resist direct injection or RAG backdoor attacks will execute identical payloads when requested by peer agents. We found that 100.0% of tested LLMs can be compromised through Inter-Agent Trust Exploitation attacks, and that every model exhibits context-dependent security behaviors that create exploitable blind spots.

Paper Structure

This paper contains 22 sections, 3 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: (a) generic structure of an intelligent agent and its autonomous interactions with both the environment and external tools LLM_Agents_survey. (b) principal attack surfaces affecting LLM agents, such as direct prompt injection and RAG-based knowledge base poisoning, with an example exploit of the agent's workflow zhang2025agentsecuritybenchasb.
  • Figure 2: Synthetic applications architecture. (a) Direct Prompt Injection Attack: an attacker directly sends a malicious prompt containing a command pipe to an LLM agent equipped with terminal access via the run_command tool. (b) RAG Backdoor Attack: a benign user queries an agentic RAG system that retrieves an attacker poisoned document from its knowledge base that triggers malicious behavior during the reasoning phase (c) Inter-Agent Trust Exploitation Attack within a multi-agent system: the calling agentic RAG retrieves malicious instructions from a compromised knowledge base and propagates them to the invoked agent, which executes the command pipe via run_command tool.
  • Figure 3: Inter-Agent Trust Exploitation attack: the calling agent transmits a malicious command pipe to a peer agent. No additional adversarial framing required; the invoked agent simply receives the raw command pipe from its peer and executes it via the run_command tool
  • Figure 4: LLM agent constructing a command injection payload. An agent with a restricted ping tool receives instructions containing command separators (';') and arbitrary shell commands. The LLM generates the complete malicious payload without recognizing the attack pattern, bypassing intended tool restrictions through command injection.
  • Figure 5: Attacks evaluation metrics across Direct Prompt Injection (DPI), RAG Backdoor Attack (RBA), Inter-Agent Trust Exploitation (IATE).