Table of Contents
Fetching ...

MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits

Brandon Radosevich, John Halloran

TL;DR

This paper analyzes security risks introduced by the Model Context Protocol (MCP), showing that enabling LLMs with MCP tools can be coerced into executing malicious actions (MCE, RAC, CT). It demonstrates attack variants including direct prompts and a novel RADE retrieval-based attack that bypasses direct access requirements. To address these risks, the authors propose MCPSafetyScanner, a multi-agent framework that automatically probes, analyzes, and remediates MCP server vulnerabilities and generates security reports. The tool is demonstrated to identify exploits on standard MCP servers and provides actionable guardrail and file-system remediation guidance. The work emphasizes that safe deployment requires both robust LLM guardrails and proactive server-side design, and it invites community adoption for pre-deployment scanning.

Abstract

To reduce development overhead and enable seamless integration between potential components comprising any given generative AI application, the Model Context Protocol (MCP) (Anthropic, 2024) has recently been released and subsequently widely adopted. The MCP is an open protocol that standardizes API calls to large language models (LLMs), data sources, and agentic tools. By connecting multiple MCP servers, each defined with a set of tools, resources, and prompts, users are able to define automated workflows fully driven by LLMs. However, we show that the current MCP design carries a wide range of security risks for end users. In particular, we demonstrate that industry-leading LLMs may be coerced into using MCP tools to compromise an AI developer's system through various attacks, such as malicious code execution, remote access control, and credential theft. To proactively mitigate these and related attacks, we introduce a safety auditing tool, MCPSafetyScanner, the first agentic tool to assess the security of an arbitrary MCP server. MCPScanner uses several agents to (a) automatically determine adversarial samples given an MCP server's tools and resources; (b) search for related vulnerabilities and remediations based on those samples; and (c) generate a security report detailing all findings. Our work highlights serious security issues with general-purpose agentic workflows while also providing a proactive tool to audit MCP server safety and address detected vulnerabilities before deployment. The described MCP server auditing tool, MCPSafetyScanner, is freely available at: https://github.com/johnhalloran321/mcpSafetyScanner

MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits

TL;DR

This paper analyzes security risks introduced by the Model Context Protocol (MCP), showing that enabling LLMs with MCP tools can be coerced into executing malicious actions (MCE, RAC, CT). It demonstrates attack variants including direct prompts and a novel RADE retrieval-based attack that bypasses direct access requirements. To address these risks, the authors propose MCPSafetyScanner, a multi-agent framework that automatically probes, analyzes, and remediates MCP server vulnerabilities and generates security reports. The tool is demonstrated to identify exploits on standard MCP servers and provides actionable guardrail and file-system remediation guidance. The work emphasizes that safe deployment requires both robust LLM guardrails and proactive server-side design, and it invites community adoption for pre-deployment scanning.

Abstract

To reduce development overhead and enable seamless integration between potential components comprising any given generative AI application, the Model Context Protocol (MCP) (Anthropic, 2024) has recently been released and subsequently widely adopted. The MCP is an open protocol that standardizes API calls to large language models (LLMs), data sources, and agentic tools. By connecting multiple MCP servers, each defined with a set of tools, resources, and prompts, users are able to define automated workflows fully driven by LLMs. However, we show that the current MCP design carries a wide range of security risks for end users. In particular, we demonstrate that industry-leading LLMs may be coerced into using MCP tools to compromise an AI developer's system through various attacks, such as malicious code execution, remote access control, and credential theft. To proactively mitigate these and related attacks, we introduce a safety auditing tool, MCPSafetyScanner, the first agentic tool to assess the security of an arbitrary MCP server. MCPScanner uses several agents to (a) automatically determine adversarial samples given an MCP server's tools and resources; (b) search for related vulnerabilities and remediations based on those samples; and (c) generate a security report detailing all findings. Our work highlights serious security issues with general-purpose agentic workflows while also providing a proactive tool to audit MCP server safety and address detected vulnerabilities before deployment. The described MCP server auditing tool, MCPSafetyScanner, is freely available at: https://github.com/johnhalloran321/mcpSafetyScanner

Paper Structure

This paper contains 15 sections, 21 figures, 3 tables.

Figures (21)

  • Figure 1: Claude refusing and executing commands which enable a remote execution attack. In Figure \ref{['fig:reverseShellFail']}, Claude proceeds with caution by first decoding the octal values, notes the security risks inherent in the request's decoded command, and correctly refuses. However, Claude executes the less deceptive request, where the command to establish a remote execution attack is passed in plaintext and added to the user's run configuration file.
  • Figure 2: Llama-3.3-70B-Instruct completes an MCE attack request.Llama-3.3-70B-Instruct shows its guardrails are being partially triggered by noting malicious use cases for this command, but nonetheless completes the request. The request is highlighted in purple, while the salient portions of Llama-3.3-70B-Instruct's response are highlighted in orange. The original unhighlighted image, as well as another completed MCE attack request, may be found in Figure \ref{['fig:mceLlamaSuccess2']}.
  • Figure 3: Threat model for a RADE attack. An attacker compromises publicly available data with targeted commands centered around a specific theme ("X" in the figure), which ends up on an MCP user's system. Compromised data is then automatically added by a retrieval agent to a vector database so that, when a user requests for content related to these themes, the malicious commands are retrieved and potentially executed automatically.
  • Figure 4: Successful RADE attack for CT: From a vector database including CT directions themed around "MCP," Claude is instructed to search for entries about the MCP and perform related actions. Claude complies, completing a RAC attack and providing attackers access to the victim's system. Conversation is condensed for brevity, full conversation is displayed in Figure \ref{['fig:radeRacPart1']} and \ref{['fig:radeRacPart2']}.
  • Figure 5: Steps and agents used by the McpSafetyScanner to detect MCP server vulnerabilities and determine remediations.
  • ...and 16 more figures