Table of Contents
Fetching ...

MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph

Zhiqiang Wang, Junyang Zhang, Guanquan Shi, HaoRan Cheng, Yunhao Yao, Kaiwen Guo, Haohua Du, Xiang-Yang Li

TL;DR

This work identifies a critical gap in MCP security: tool poisoning that corrupts the planning context without requiring tool execution. It introduces the Decision Dependence Graph (DDG) to model LLM decision-making via attention flow and presents MindGuard, a non-invasive guardrail that offers decision-level tracking, policy-agnostic detection, and attribution. Through robust DDG construction and AIR-based anomaly analysis, MindGuard achieves high detection and attribution performance with sub-second latency and no extra token cost, generalizing across multiple LLMs and MCP server configurations. The approach reframes security from behavior-level monitoring to decision-level introspection and provides a foundation for enforcing security policies at the decision level in probabilistic, tool-integrated AI systems.

Abstract

The Model Context Protocol (MCP) is increasingly adopted to standardize the interaction between LLM agents and external tools. However, this trend introduces a new threat: Tool Poisoning Attacks (TPA), where tool metadata is poisoned to induce the agent to perform unauthorized operations. Existing defenses that primarily focus on behavior-level analysis are fundamentally ineffective against TPA, as poisoned tools need not be executed, leaving no behavioral trace to monitor. Thus, we propose MindGuard, a decision-level guardrail for LLM agents, providing provenance tracking of call decisions, policy-agnostic detection, and poisoning source attribution against TPA. While fully explaining LLM decision remains challenging, our empirical findings uncover a strong correlation between LLM attention mechanisms and tool invocation decisions. Therefore, we choose attention as an empirical signal for decision tracking and formalize this as the Decision Dependence Graph (DDG), which models the LLM's reasoning process as a weighted, directed graph where vertices represent logical concepts and edges quantify the attention-based dependencies. We further design robust DDG construction and graph-based anomaly analysis mechanisms that efficiently detect and attribute TPA attacks. Extensive experiments on real-world datasets demonstrate that MindGuard achieves 94\%-99\% average precision in detecting poisoned invocations, 95\%-100\% attribution accuracy, with processing times under one second and no additional token cost. Moreover, DDG can be viewed as an adaptation of the classical Program Dependence Graph (PDG), providing a solid foundation for applying traditional security policies at the decision level.

MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph

TL;DR

This work identifies a critical gap in MCP security: tool poisoning that corrupts the planning context without requiring tool execution. It introduces the Decision Dependence Graph (DDG) to model LLM decision-making via attention flow and presents MindGuard, a non-invasive guardrail that offers decision-level tracking, policy-agnostic detection, and attribution. Through robust DDG construction and AIR-based anomaly analysis, MindGuard achieves high detection and attribution performance with sub-second latency and no extra token cost, generalizing across multiple LLMs and MCP server configurations. The approach reframes security from behavior-level monitoring to decision-level introspection and provides a foundation for enforcing security policies at the decision level in probabilistic, tool-integrated AI systems.

Abstract

The Model Context Protocol (MCP) is increasingly adopted to standardize the interaction between LLM agents and external tools. However, this trend introduces a new threat: Tool Poisoning Attacks (TPA), where tool metadata is poisoned to induce the agent to perform unauthorized operations. Existing defenses that primarily focus on behavior-level analysis are fundamentally ineffective against TPA, as poisoned tools need not be executed, leaving no behavioral trace to monitor. Thus, we propose MindGuard, a decision-level guardrail for LLM agents, providing provenance tracking of call decisions, policy-agnostic detection, and poisoning source attribution against TPA. While fully explaining LLM decision remains challenging, our empirical findings uncover a strong correlation between LLM attention mechanisms and tool invocation decisions. Therefore, we choose attention as an empirical signal for decision tracking and formalize this as the Decision Dependence Graph (DDG), which models the LLM's reasoning process as a weighted, directed graph where vertices represent logical concepts and edges quantify the attention-based dependencies. We further design robust DDG construction and graph-based anomaly analysis mechanisms that efficiently detect and attribute TPA attacks. Extensive experiments on real-world datasets demonstrate that MindGuard achieves 94\%-99\% average precision in detecting poisoned invocations, 95\%-100\% attribution accuracy, with processing times under one second and no additional token cost. Moreover, DDG can be viewed as an adaptation of the classical Program Dependence Graph (PDG), providing a solid foundation for applying traditional security policies at the decision level.

Paper Structure

This paper contains 34 sections, 2 theorems, 9 equations, 14 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Any tool call that passes the AIR detection $\alpha_{s,t}<\tau$ satisfies the Policy-based Decision-level Security.

Figures (14)

  • Figure 1: Tool Poisoning Attack in MCP workflow and MindGuard defense. MindGuard is a non-invasive plugin with three core capabilities: Decision-level Tracking (track the provenance of call decisions), Policy-agnostic Detecting(identify anomalous invocations for unknown attacks), and Attributing (source malicious calls to poisoned tools).
  • Figure 2: TPA examples and comparison between behavior-level (tracks explicit invocation behavior) and decision-level analysis (explains who influences the invocation decision).
  • Figure 3: Different attention patterns for Poisoned Invocation (Malicious) and Normal Invocation (Benign). Tools that influence the final call demonstrate pronounced attention activation (ReadFile, CommonTool in Poisoned Invocation and ListDirectory in Normal Invocation). We aim to detect Poisoned Invocation and subsequently attribute it to the poisoned source (CommonTool).
  • Figure 4: Attention score distribution for uninvoked tools and queries. For a poisoned invocation, the poisoned tool (Poisoned Invoc. in (a)) exhibits high attention scores while the query shows low activation (Poisoned Invoc. in (b)).
  • Figure 5: System Design of MindGuard. Once generating a tool call, MindGuard parses the LLM's context (Context Parser in § 5.1) and builds a DDG from its attention matrix (DDG Builder in § 5.2). The DDG is then analyzed to detect poisoned invocations and attribute them to poisoned source (Anomaly-aware Defender in § 5.3). Moreover, the DDG provides a concrete substrate to achieve Policy-based Decision-level Security using existing security policies (Anomaly-aware Defender in § 5.3).
  • ...and 9 more figures

Theorems & Definitions (5)

  • Definition 1: Decision-level Security
  • Definition 1.1: Policy-based Decision-level Security
  • Proposition 1
  • Proposition 1
  • proof