Table of Contents
Fetching ...

MCPZoo: A Large-Scale Dataset of Runnable Model Context Protocol Servers for AI Agent

Mengying Wu, Pei Chen, Geng Hong, Baichao An, Jinsong Chen, Binwang Wan, Xudong Pan, Jiarun Dai, Min Yang

TL;DR

This work tackles the lack of large-scale, runnable datasets for the Model Context Protocol (MCP) by introducing MCPZoo, a comprehensive collection of 90,146 MCP servers with 14,206 runnable instances sourced from six public platforms. The authors detail an end-to-end pipeline—data collection, de-duplication, code retrieval, Docker image construction, and rigorous liveness checks—to ensure a usable, curated repository with unified metadata and remote access. Key contributions include the scale (largest to date), runnable subset, open access with discovery interfaces, and infrastructure to support reproducible security analyses and agent benchmarking in MCP ecosystems. The resource enables scalable, real-world experimentation and protocol studies, reducing deployment barriers for researchers and facilitating ecosystem-level insights.

Abstract

Model Context Protocol (MCP) enables agents to interact with external tools, yet empirical research on MCP is hindered by the lack of large-scale, accessible datasets. We present MCPZoo, the largest and most comprehensive dataset of MCP servers collected from multiple public sources, comprising 90,146 servers. MCPZoo includes over ten thousand server instances that have been deployed and verified as runnable and interactable, supporting realistic experimentation beyond static analysis. The dataset provides unified metadata and access interfaces, enabling systematic exploration and interaction without manual deployment effort. MCPZoo is released as an open and accessible resource to support research on MCP-based security analysis.

MCPZoo: A Large-Scale Dataset of Runnable Model Context Protocol Servers for AI Agent

TL;DR

This work tackles the lack of large-scale, runnable datasets for the Model Context Protocol (MCP) by introducing MCPZoo, a comprehensive collection of 90,146 MCP servers with 14,206 runnable instances sourced from six public platforms. The authors detail an end-to-end pipeline—data collection, de-duplication, code retrieval, Docker image construction, and rigorous liveness checks—to ensure a usable, curated repository with unified metadata and remote access. Key contributions include the scale (largest to date), runnable subset, open access with discovery interfaces, and infrastructure to support reproducible security analyses and agent benchmarking in MCP ecosystems. The resource enables scalable, real-world experimentation and protocol studies, reducing deployment barriers for researchers and facilitating ecosystem-level insights.

Abstract

Model Context Protocol (MCP) enables agents to interact with external tools, yet empirical research on MCP is hindered by the lack of large-scale, accessible datasets. We present MCPZoo, the largest and most comprehensive dataset of MCP servers collected from multiple public sources, comprising 90,146 servers. MCPZoo includes over ten thousand server instances that have been deployed and verified as runnable and interactable, supporting realistic experimentation beyond static analysis. The dataset provides unified metadata and access interfaces, enabling systematic exploration and interaction without manual deployment effort. MCPZoo is released as an open and accessible resource to support research on MCP-based security analysis.

Paper Structure

This paper contains 10 sections, 2 tables.