Table of Contents
Fetching ...

Network and Systems Performance Characterization of MCP-Enabled LLM Agents

Zihao Ding, Mufeng Zhu, Yao Liu

TL;DR

Model Context Protocol (MCP) enables LLMs to orchestrate external tools, but MCP-enabled workflows incur substantial prompt overhead due to rich contextual input. The authors perform a measurement-based analysis combining OpenRouter usage traces with an instrumented MCP host (Cline) to quantify token usage, monetary cost, and latency across nine LLMs and multiple MCP configurations. They find that prompt-to-completion token inflation is substantial, with MCP token ratios far lower than general traffic ($2\times$–$30\times$ lower completion-to-prompt), driven by system prompts, history, and tool observations. The study proposes optimizations such as parallel tool calls and reliable task-abort mechanisms to reduce token counts and latency, offering practical guidance for building more efficient MCP-enabled workflows.

Abstract

Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools and services, significantly enhancing their capabilities. However, the inclusion of extensive contextual information, including system prompts, MCP tool definitions, and context histories, in MCP-enabled LLM interactions, dramatically inflates token usage. Given that LLM providers charge based on tokens, these expanded contexts can quickly escalate monetary costs and increase the computational load on LLM services. This paper presents a comprehensive measurement-based analysis of MCP-enabled interactions with LLMs, revealing trade-offs between capability, performance, and cost. We explore how different LLM models and MCP configurations impact key performance metrics such as token efficiency, monetary cost, task completion times, and task success rates, and suggest potential optimizations, including enabling parallel tool calls and implementing robust task abort mechanisms. These findings provide useful insights for developing more efficient, robust, and cost-effective MCP-enabled workflows.

Network and Systems Performance Characterization of MCP-Enabled LLM Agents

TL;DR

Model Context Protocol (MCP) enables LLMs to orchestrate external tools, but MCP-enabled workflows incur substantial prompt overhead due to rich contextual input. The authors perform a measurement-based analysis combining OpenRouter usage traces with an instrumented MCP host (Cline) to quantify token usage, monetary cost, and latency across nine LLMs and multiple MCP configurations. They find that prompt-to-completion token inflation is substantial, with MCP token ratios far lower than general traffic ( lower completion-to-prompt), driven by system prompts, history, and tool observations. The study proposes optimizations such as parallel tool calls and reliable task-abort mechanisms to reduce token counts and latency, offering practical guidance for building more efficient MCP-enabled workflows.

Abstract

Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools and services, significantly enhancing their capabilities. However, the inclusion of extensive contextual information, including system prompts, MCP tool definitions, and context histories, in MCP-enabled LLM interactions, dramatically inflates token usage. Given that LLM providers charge based on tokens, these expanded contexts can quickly escalate monetary costs and increase the computational load on LLM services. This paper presents a comprehensive measurement-based analysis of MCP-enabled interactions with LLMs, revealing trade-offs between capability, performance, and cost. We explore how different LLM models and MCP configurations impact key performance metrics such as token efficiency, monetary cost, task completion times, and task success rates, and suggest potential optimizations, including enabling parallel tool calls and implementing robust task abort mechanisms. These findings provide useful insights for developing more efficient, robust, and cost-effective MCP-enabled workflows.

Paper Structure

This paper contains 20 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: MCP-enabled workflow
  • Figure 2: Structure of the API requests sent from Cline (the MCP Host) to the LLM
  • Figure 3: Prompt Token vs. Completion Token.
  • Figure 4: Task Difficulty vs. Total Tokens
  • Figure 5: Task Success Rate Comparisons.
  • ...and 5 more figures