Table of Contents
Fetching ...

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, Ahmed E. Hassan

TL;DR

The paper tackles the fragmentation arising from tool calling in FM-based AI systems by studying MCP servers at scale. It employs a hybrid methodology—general static analysis (SonarQube) plus an MCP-specific scanner (mcp-scan)—across 1,899 MCP servers (official, community, mined) to assess health, security, and maintainability, using LLM-Jury to distill patterns and baselines from prior work. Key findings show MCP servers generally sustain healthy development trajectories, but exhibit MCP-specific vulnerabilities (7.2% with various patterns, 5.5% tool poisoning) and maintainability challenges (66% with code smells, 14.4% with bugs), indicating the need for MCP-focused tooling and governance. The results highlight actionable implications for researchers, practitioners, and MCP registries to improve security auditing, tooling adoption, and ecosystem governance as MCP adoption accelerates.

Abstract

Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SDK downloads. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination. Towards this end, we present the first large-scale empirical study of MCP servers. Using state-of-the-art health metrics and a hybrid analysis pipeline, combining a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities - only three overlapping with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain nine bug patterns overlapping with traditional open-source software projects. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

TL;DR

The paper tackles the fragmentation arising from tool calling in FM-based AI systems by studying MCP servers at scale. It employs a hybrid methodology—general static analysis (SonarQube) plus an MCP-specific scanner (mcp-scan)—across 1,899 MCP servers (official, community, mined) to assess health, security, and maintainability, using LLM-Jury to distill patterns and baselines from prior work. Key findings show MCP servers generally sustain healthy development trajectories, but exhibit MCP-specific vulnerabilities (7.2% with various patterns, 5.5% tool poisoning) and maintainability challenges (66% with code smells, 14.4% with bugs), indicating the need for MCP-focused tooling and governance. The results highlight actionable implications for researchers, practitioners, and MCP registries to improve security auditing, tooling adoption, and ecosystem governance as MCP adoption accelerates.

Abstract

Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SDK downloads. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination. Towards this end, we present the first large-scale empirical study of MCP servers. Using state-of-the-art health metrics and a hybrid analysis pipeline, combining a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities - only three overlapping with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain nine bug patterns overlapping with traditional open-source software projects. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices.

Paper Structure

This paper contains 47 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: A motivating example of developing FM-based AI applications. In (a), Alex developed an AI application using framework A and did not need any custom tools. In (b), when enhancing an existing application written in framework B, they had to build a custom stripe tool because B does not support A’s built-in tools. In (c), they had to re-implement the same stripe tool again with a different interface to integrate it into the C framework. In contrast (d), where MCP servers offer a way to decouple tools from frameworks and enable interoperability — but raise new questions around sustainability, security, and maintainability.
  • Figure 2: High-level overview of MCP client-server architecture
  • Figure 3: Overview of the study design.
  • Figure 4: Vulnerability count distribution per MCP server grouped by Integration Type.
  • Figure 5: Examples of credential exposure across different code and configuration formats. As these are sensitive credentials and keys we have obfuscated those.
  • ...and 2 more figures