Table of Contents
Fetching ...

Overthinking Loops in Agents: A Structural Risk via MCP Tools

Yohan Lee, Jisoo Jang, Seoyeon Choi, Sangyeop Kim, Seungtaek Choi

TL;DR

It is found that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone, suggesting defenses should reason about tool-call structure rather than tokens alone.

Abstract

Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages. We show that this convenience creates a supply-chain attack surface: a malicious MCP tool server can be co-registered alongside normal tools and induce overthinking loops, where individually trivial or plausible tool calls compose into cyclic trajectories that inflate end-to-end tokens and latency without any single step looking abnormal. We formalize this as a structural overthinking attack, distinguishable from token-level verbosity, and implement 14 malicious tools across three servers that trigger repetition, forced refinement, and distraction. Across heterogeneous registries and multiple tool-capable models, the attack causes severe resource amplification (up to $142.4\times$ tokens) and can degrade task outcomes. Finally, we find that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone.

Overthinking Loops in Agents: A Structural Risk via MCP Tools

TL;DR

It is found that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone, suggesting defenses should reason about tool-call structure rather than tokens alone.

Abstract

Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool names, descriptions, and return messages. We show that this convenience creates a supply-chain attack surface: a malicious MCP tool server can be co-registered alongside normal tools and induce overthinking loops, where individually trivial or plausible tool calls compose into cyclic trajectories that inflate end-to-end tokens and latency without any single step looking abnormal. We formalize this as a structural overthinking attack, distinguishable from token-level verbosity, and implement 14 malicious tools across three servers that trigger repetition, forced refinement, and distraction. Across heterogeneous registries and multiple tool-capable models, the attack causes severe resource amplification (up to tokens) and can degrade task outcomes. Finally, we find that decoding-time concision controls do not reliably prevent loop induction, suggesting defenses should reason about tool-call structure rather than tokens alone.
Paper Structure (32 sections, 2 equations, 4 figures, 6 tables)

This paper contains 32 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Token usage explosion under MCP-induced overthinking attacks in Qwen-Code settings. Total tokens (log scale, in millions) for five models with and without the attack; red bars (mixed) show attacked runs, and gray bars (normal) show the no-attack baseline. The attack amplifies token consumption up to $142.4\times$.
  • Figure 2: Overview of the MCP-driven overthinking attack surface. Malicious tools hidden within a mixed registry exploit standard MCP interfaces to lure agents into crafted cyclic loops. Unlike normal operations (gray path), this attack path (red path) forces excessive, redundant reasoning steps, leading to severe denial-of-service through exponential token consumption and latency amplification.
  • Figure 3: Accuracy and token consumption in ReAct settings.
  • Figure 4: Token amplification (vs. normal baseline) in attack-only and mixed registry settings. Each model shows three dataset pairs (AIME2025, GPQA Diamond, HumanEval). Solid bars denote attack-only, hatched bars denote mixed.