Table of Contents
Fetching ...

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Yuval Felendler, Parth A. Gandhi, Idan Habler, Yuval Elovici, Asaf Shabtai

TL;DR

This work compares traditional context-coupled MCP with Code Execution MCP (CE-MCP), showing that CE-MCP can dramatically reduce token usage, latency, and interaction turns by collapsing tool orchestration into a single executable program run in a sandbox. It formalizes the MAESTRO threat model to analyze security across CE-MCP phases, demonstrates exploitable adversarial attacks at multiple phases, and introduces layered defenses (pre-, during-, and post-execution) to mitigate risks. The findings indicate that CE-MCP is not universally superior; its effectiveness depends on task structure, excelling in data-driven, multi-tool workflows while presenting new security challenges that require robust runtime governance and semantic validation. Overall, the paper argues for a hybrid, task-aware deployment of MCP architectures with explicit security controls focused on execution semantics rather than only prompt-level filtering.

Abstract

Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution environments. As MCP-based systems scale to incorporate larger tool catalogs and multiple concurrently connected MCP servers, traditional tool-by-tool invocation increases coordination overhead, fragments state management, and limits support for wide-context operations. To address these scalability challenges, recent MCP designs have incorporated code execution as a first-class capability, an approach called Code Execution MCP (CE-MCP). This enables agents to consolidate complex workflows, such as SQL querying, file analysis, and multi-step data transformations, into a single program that executes within an isolated runtime environment. In this work, we formalize the architectural distinction between context-coupled (traditional) and context-decoupled (CE-MCP) models, analyzing their fundamental scalability trade-offs. Using the MCP-Bench framework across 10 representative servers, we empirically evaluate task behavior, tool utilization patterns, execution latency, and protocol efficiency as the scale of connected MCP servers and available tools increases, demonstrating that while CE-MCP significantly reduces token usage and execution latency, it introduces a vastly expanded attack surface. We address this security gap by applying the MAESTRO framework, identifying sixteen attack classes across five execution phases-including specific code execution threats such as exception-mediated code injection and unsafe capability synthesis. We validate these vulnerabilities through adversarial scenarios across multiple LLMs and propose a layered defense architecture comprising containerized sandboxing and semantic gating. Our findings provide a rigorous roadmap for balancing scalability and security in production-ready executable agent workflows.

From Tool Orchestration to Code Execution: A Study of MCP Design Choices

TL;DR

This work compares traditional context-coupled MCP with Code Execution MCP (CE-MCP), showing that CE-MCP can dramatically reduce token usage, latency, and interaction turns by collapsing tool orchestration into a single executable program run in a sandbox. It formalizes the MAESTRO threat model to analyze security across CE-MCP phases, demonstrates exploitable adversarial attacks at multiple phases, and introduces layered defenses (pre-, during-, and post-execution) to mitigate risks. The findings indicate that CE-MCP is not universally superior; its effectiveness depends on task structure, excelling in data-driven, multi-tool workflows while presenting new security challenges that require robust runtime governance and semantic validation. Overall, the paper argues for a hybrid, task-aware deployment of MCP architectures with explicit security controls focused on execution semantics rather than only prompt-level filtering.

Abstract

Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution environments. As MCP-based systems scale to incorporate larger tool catalogs and multiple concurrently connected MCP servers, traditional tool-by-tool invocation increases coordination overhead, fragments state management, and limits support for wide-context operations. To address these scalability challenges, recent MCP designs have incorporated code execution as a first-class capability, an approach called Code Execution MCP (CE-MCP). This enables agents to consolidate complex workflows, such as SQL querying, file analysis, and multi-step data transformations, into a single program that executes within an isolated runtime environment. In this work, we formalize the architectural distinction between context-coupled (traditional) and context-decoupled (CE-MCP) models, analyzing their fundamental scalability trade-offs. Using the MCP-Bench framework across 10 representative servers, we empirically evaluate task behavior, tool utilization patterns, execution latency, and protocol efficiency as the scale of connected MCP servers and available tools increases, demonstrating that while CE-MCP significantly reduces token usage and execution latency, it introduces a vastly expanded attack surface. We address this security gap by applying the MAESTRO framework, identifying sixteen attack classes across five execution phases-including specific code execution threats such as exception-mediated code injection and unsafe capability synthesis. We validate these vulnerabilities through adversarial scenarios across multiple LLMs and propose a layered defense architecture comprising containerized sandboxing and semantic gating. Our findings provide a rigorous roadmap for balancing scalability and security in production-ready executable agent workflows.
Paper Structure (61 sections, 11 figures, 4 tables)

This paper contains 61 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: MCP flow. The MCP allows different AI applications to connect to different MCP servers and use their resources, prompts, data, and tools.
  • Figure 2: CE-MCP workflow. The figure illustrates the actions performed by the agent, from the user query to the final answer returned to the user, including tool discovery, code generation and planning, code execution, and result handling and validation.
  • Figure 3: Threat vectors across the CE-MCP execution flow phases modeled via MAESTRO. The figure illustrates how adversarial influence can be introduced during tool discovery, code generation, execution, response handling, and runtime impact.
  • Figure 4: Total token usage for the MCP and CE-MCP, aggregated across all models and servers.
  • Figure 5: End-to-end execution time distribution for the MCP and CE-MCP
  • ...and 6 more figures