Table of Contents
Fetching ...

How are AI agents used? Evidence from 177,000 MCP tools

Merlin Stein

Abstract

Today's AI agents are built on large language models (LLMs) equipped with tools to access and modify external environments, such as corporate file systems, API-accessible platforms and websites. AI agents offer the promise of automating computer-based tasks across the economy. However, developers, researchers and governments lack an understanding of how AI agents are currently being used, and for what kinds of (consequential) tasks. To address this gap, we evaluated 177,436 agent tools created from 11/2024 to 02/2026 by monitoring public Model Context Protocol (MCP) server repositories, the current predominant standard for agent tools. We categorise tools according to their direct impact: perception tools to access and read data, reasoning tools to analyse data or concepts, and action tools to directly modify external environments, like file editing, sending emails or steering drones in the physical world. We use O*NET mapping to identify each tool's task domain and consequentiality. Software development accounts for 67% of all agent tools, and 90% of MCP server downloads. Notably, the share of 'action' tools rose from 27% to 65% of total usage over the 16-month period sampled. While most action tools support medium-stakes tasks like editing files, there are action tools for higher-stakes tasks like financial transactions. Using agentic financial transactions as an example, we demonstrate how governments and regulators can use this monitoring method to extend oversight beyond model outputs to the tool layer to monitor risks of agent deployment.

How are AI agents used? Evidence from 177,000 MCP tools

Abstract

Today's AI agents are built on large language models (LLMs) equipped with tools to access and modify external environments, such as corporate file systems, API-accessible platforms and websites. AI agents offer the promise of automating computer-based tasks across the economy. However, developers, researchers and governments lack an understanding of how AI agents are currently being used, and for what kinds of (consequential) tasks. To address this gap, we evaluated 177,436 agent tools created from 11/2024 to 02/2026 by monitoring public Model Context Protocol (MCP) server repositories, the current predominant standard for agent tools. We categorise tools according to their direct impact: perception tools to access and read data, reasoning tools to analyse data or concepts, and action tools to directly modify external environments, like file editing, sending emails or steering drones in the physical world. We use O*NET mapping to identify each tool's task domain and consequentiality. Software development accounts for 67% of all agent tools, and 90% of MCP server downloads. Notably, the share of 'action' tools rose from 27% to 65% of total usage over the 16-month period sampled. While most action tools support medium-stakes tasks like editing files, there are action tools for higher-stakes tasks like financial transactions. Using agentic financial transactions as an example, we demonstrate how governments and regulators can use this monitoring method to extend oversight beyond model outputs to the tool layer to monitor risks of agent deployment.
Paper Structure (53 sections, 9 figures, 7 tables)

This paper contains 53 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Monitoring 177k MCP tools. Panel A illustrates how we curate and classify 177k MCP tools from GitHub and Smithery using a human-validated LLM judge, along O*NET onetonlineONETOnLine and US CAISI CAISI2025 taxonomies into generality (general-/narrow-purpose), direct impact (action/perception/reasoning), and task domain (Sections \ref{['sec:data']} and \ref{['sec:methodology']}). $\kappa$ is Fleiss' kappa across 14 expert validators ($n=100$ each; Appendix \ref{['app:validation']}). Panel B shows cumulative MCP tools over time: all tools (black) by creation date and AI-coauthored tools (blue), by the month of first labelled AI evidence in an MCP repository (Method & results in Sections \ref{['sec:ai-detection-method']} & \ref{['sec:ai-created-results']}). Panel C shows cumulative tools by O*NET task domain. 'Other' (dashed) aggregates remaining domains (Method & results in Sections \ref{['sec:topic-modelling']} & \ref{['sec:res-task-domains']}). Panel D shows monthly aggregate downloads of MCP server tools, proxying usage. Dots are monthly totals for all servers (black) and AI-coauthored servers (blue); downloads are attributed to the AI-coauthored series only from the date of first detected AI evidence. Lines show exponential fits $y = A\,e^{kt}$ (Nov 2024--Feb 2026). Legend reports the doubling time $\ln 2 / k$. Shading shows 95% bootstrap confidence intervals. Panel E shows the share of monthly downloads by the type of tools, specifically the set of actions a tool allows an agent to take ('action space'). Dots are monthly shares. "Action tools" (light pink) denotes all tools classified as action. Three subcategories cross-classify action tools by generality ("general-purpose," red) and O*NET occupational-impact stakes ("medium-stakes" 50--75, dark red; "high-stakes" 75--100, black, on the 0--100 O*NET impact-of-decisions scale). Subcategory shares do not sum to the action total as dimensions are independent. Lines show WLS fits: asymptotic convergence for action and medium-stakes (95% error-propagation CI), and poly-convergence for general-purpose and high-stakes (95% bootstrap CI). Method & results in Sections \ref{['sec:direct-impact']}, \ref{['sec:generality']}, \ref{['sec:res-directimpact']} & \ref{['sec:res-generality']}.
  • Figure 2: Consequentiality distribution of AI agent actions. The figure shows the stakes of computer-based occupations, and the number of tools related to each occupation. Each dot represents one SOC-O*Net occupation. Occupation stakes are based on an O*NET survey onetonlineONETOnLine asking employees to rate 'What results do your decisions usually have on other people or the image or reputation or financial resources of your employer?' on a scale of 0-100. Absolute values are meaningless and imprecise, thus the axis labels are omitted. The y-axis (log scale) shows the number of published AI agent action tools mapped to each occupation. The dashed curve shows a quadratic polynomial fit. The fit explains little of the cross-occupation variance ($R^2 \approx 0.03$; an F-test rejects that all slope coefficients are jointly zero, $p = 0.015$), reflecting substantial heterogeneity. The pink-shaded region highlights high-stakes occupations (score $>$75) with many tools. Occupations without any associated agent tools -- near-exclusively non-computer-based occupations -- are excluded.
  • Figure 3: Geographic distribution of AI agent actions. Share of worldwide PyPI downloads of MCP servers with action tools 11/2024 to 10/2025 (colour intensity), in brackets percentage point change of share H1 to H2 2025. $N{=}6.73$M downloads of 528 MCP servers with action tools have download data by geography (of 11,174 MCP servers with action tools, 2,467 have download data; see Section \ref{['sec:geo-tools']}). CA (Canada), US (United States), SE (Sweden), GB (Great Britain/United Kingdom), NL (Netherlands), DE (Germany), FR (France), KR (South Korea), CN (China), JP (Japan), TW (Taiwan), IN (India), SG (Singapore), AU (Australia).
  • Figure 4: AI agent tool usage for perception, reasoning and action. Stacked area chart showing the percentage of monthly tool downloads on PyPI and NPM (y-axis) by tool functionality subcategory, from Nov 2024 to Jan 2026 (x-axis; $n$ indicates total monthly downloads). Stacked areas are grouped into three direct impact types (bottom to top): Action (red shades), Reasoning (blue shades), and Perception (grey). Parenthetical percentages in the legend show each subcategory's overall download share across all months. Subcategory definitions follow the taxonomy in Section \ref{['sec:direct-impact']}; software extensions are tools for specific software packages and APIs, code execution covers command-line tools (e.g., a bash tool), and computer use includes tools for mouse-based computer control, browser automation, and GUI interaction. The black trend line shows an asymptotic convergence model $y(t) = L - (L - y_0)\,e^{-kt}$ fitted via weighted least squares (by monthly downloads). The asymptotic limit $L$ and 95% confidence interval (from the parameter covariance matrix) are on the legend; grey lines show 95% CI of overall trend. LLM classification validated by human experts (78% agreement, see Appendix \ref{['app:validation']}). Download data is on server level, allocated assuming 1 server install $=$ 1 use of every tool on the server.
  • Figure 5: General-purpose tool share over time.Top: general-purpose share of total monthly downloads ($n$ = total npm/PyPI downloads per month); bottom: general-purpose share of cumulative published servers (# = cumulative server count). Dots are observed monthly values. Curves: polynomial-convergence model $y = L - \exp(a + bt + ct^2)$ fitted via WLS. Shading: 95% confidence intervals---wild bootstrap (top) and standard covariance-based (bottom). Download-weighted general-purpose share converges toward $L = 55\%$ [50%, 100%] ($R^2 = 0.84$); the count-based share remains stable near $L = 21\%$ [21%, 21%] ($R^2 \approx 0$).
  • ...and 4 more figures