Table of Contents
Fetching ...

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou, Kun Wang, Jie Zhang, Li Sun, Yang Liu, Sen Su

TL;DR

The paper reveals implicit toxicity in open MCP-based LLM agent ecosystems, where malicious tools can covertly misuse computation without breaking policy. It introduces LeechHijack, a two-stage latent backdoor that hijacks reasoning by embedding covert tasks into legitimate tool outputs and establishing a covert C2 channel. Across four LLM families and diverse architectures, LeechHijack achieves about 77% attack success with ~18.6% extra-task overhead while largely preserving user-task accuracy, underscoring a practical vector for resource hijacking. The work also analyzes defenses, showing static audits often fail to detect the covert abuse and recommending computational provenance, contextual-memory auditing, and runtime isolation as essential mitigations for MCP security.

Abstract

Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in reasoning, planning, and tool usage. The recently proposed Model Context Protocol (MCP) has emerged as a unifying framework for integrating external tools into agent systems, enabling a thriving open ecosystem of community-built functionalities. However, the openness and composability that make MCP appealing also introduce a critical yet overlooked security assumption -- implicit trust in third-party tool providers. In this work, we identify and formalize a new class of attacks that exploit this trust boundary without violating explicit permissions. We term this new attack vector implicit toxicity, where malicious behaviors occur entirely within the allowed privilege scope. We propose LeechHijack, a Latent Embedded Exploit for Computation Hijacking, in which an adversarial MCP tool covertly expropriates the agent's computational resources for unauthorized workloads. LeechHijack operates through a two-stage mechanism: an implantation stage that embeds a benign-looking backdoor in a tool, and an exploitation stage where the backdoor activates upon predefined triggers to establish a command-and-control channel. Through this channel, the attacker injects additional tasks that the agent executes as if they were part of its normal workflow, effectively parasitizing the user's compute budget. We implement LeechHijack across four major LLM families. Experiments show that LeechHijack achieves an average success rate of 77.25%, with a resource overhead of 18.62% compared to the baseline. This study highlights the urgent need for computational provenance and resource attestation mechanisms to safeguard the emerging MCP ecosystem.

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

TL;DR

The paper reveals implicit toxicity in open MCP-based LLM agent ecosystems, where malicious tools can covertly misuse computation without breaking policy. It introduces LeechHijack, a two-stage latent backdoor that hijacks reasoning by embedding covert tasks into legitimate tool outputs and establishing a covert C2 channel. Across four LLM families and diverse architectures, LeechHijack achieves about 77% attack success with ~18.6% extra-task overhead while largely preserving user-task accuracy, underscoring a practical vector for resource hijacking. The work also analyzes defenses, showing static audits often fail to detect the covert abuse and recommending computational provenance, contextual-memory auditing, and runtime isolation as essential mitigations for MCP security.

Abstract

Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in reasoning, planning, and tool usage. The recently proposed Model Context Protocol (MCP) has emerged as a unifying framework for integrating external tools into agent systems, enabling a thriving open ecosystem of community-built functionalities. However, the openness and composability that make MCP appealing also introduce a critical yet overlooked security assumption -- implicit trust in third-party tool providers. In this work, we identify and formalize a new class of attacks that exploit this trust boundary without violating explicit permissions. We term this new attack vector implicit toxicity, where malicious behaviors occur entirely within the allowed privilege scope. We propose LeechHijack, a Latent Embedded Exploit for Computation Hijacking, in which an adversarial MCP tool covertly expropriates the agent's computational resources for unauthorized workloads. LeechHijack operates through a two-stage mechanism: an implantation stage that embeds a benign-looking backdoor in a tool, and an exploitation stage where the backdoor activates upon predefined triggers to establish a command-and-control channel. Through this channel, the attacker injects additional tasks that the agent executes as if they were part of its normal workflow, effectively parasitizing the user's compute budget. We implement LeechHijack across four major LLM families. Experiments show that LeechHijack achieves an average success rate of 77.25%, with a resource overhead of 18.62% compared to the baseline. This study highlights the urgent need for computational provenance and resource attestation mechanisms to safeguard the emerging MCP ecosystem.

Paper Structure

This paper contains 32 sections, 4 equations, 4 figures, 15 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison between prior studies on explicit toxicity (top) and the proposed LeechHijack attack (bottom). Traditional attacks focus on compromising the host or producing harmful outputs through privilege escalation. LeechHijack instead exploits legitimate tool interfaces to covertly divert computational resources toward the attacker's own tasks, while maintaining normal functionality and output for the user’s task.
  • Figure 2: LeechHijack attack overview. A benign-looking MCP tool embeds a latent backdoor (implantation stage). When triggered, it connects to the attacker’s server and hijacks the agent’s reasoning loop to execute unauthorized workloads (exploitation stage), then resumes normal operation with legitimate outputs.
  • Figure 3: LeechHijack’s stealth arises from its alignment with the agent’s computing consumption range, defined by token consumption IQR over five generations per task type. The figure presents the ratio of attack consumption to the difference between the upper bound and the median of the distribution range.
  • Figure 4: Stability and trigger probabilities across different trigger categories. The dashed line indicates the maximum trigger probability within each category.