Table of Contents
Fetching ...

ExtendAttack: Attacking Servers of LRMs via Extending Reasoning

Zhenhao Zhu, Yue Liu, Zhiwei Xu, Yingwei Ma, Hongcheng Gao, Nuo Chen, Yanpei Guo, Wenjie Qu, Huiying Xu, Zifeng Kang, Xinzhong Zhu, Jiaheng Zhang

TL;DR

This work identifies a stealthy resource-depletion threat against Large Reasoning Models by embedding heavy decoding tasks directly into natural prompts. It introduces ExtendAttack, a black-box attack that performs poly-base ASCII decoding at the character level to force lengthy, semantically valid reasoning before answering, thereby increasing token output and latency while preserving accuracy. Through extensive evaluation on closed- and open-domain LRMs across AIME, HumanEval, and BigCodeBench-Complete benchmarks, ExtendAttack consistently outperforms direct and prior slowdown attacks in overhead without sacrificing correctness. The authors also explore defenses, showing limitations of pattern matching, perplexity screening, and guardrails, which motivates the need for defenses that monitor the reasoning process itself rather than only content safety.

Abstract

Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it significantly increases response length and latency, with the former increasing by over 2.7 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.

ExtendAttack: Attacking Servers of LRMs via Extending Reasoning

TL;DR

This work identifies a stealthy resource-depletion threat against Large Reasoning Models by embedding heavy decoding tasks directly into natural prompts. It introduces ExtendAttack, a black-box attack that performs poly-base ASCII decoding at the character level to force lengthy, semantically valid reasoning before answering, thereby increasing token output and latency while preserving accuracy. Through extensive evaluation on closed- and open-domain LRMs across AIME, HumanEval, and BigCodeBench-Complete benchmarks, ExtendAttack consistently outperforms direct and prior slowdown attacks in overhead without sacrificing correctness. The authors also explore defenses, showing limitations of pattern matching, perplexity screening, and guardrails, which motivates the need for defenses that monitor the reasoning process itself rather than only content safety.

Abstract

Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it significantly increases response length and latency, with the former increasing by over 2.7 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.

Paper Structure

This paper contains 29 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Comparison of ExtendAttack with baseline methods. This figure illustrates the behavior of a LRM under three distinct scenarios. Direct Answer: The model provides an efficient and direct response to a standard, unmodified prompt. Overthinking: A capable model like o3 can recognize the context-irrelevant decoy task as unrelated and chooses to ignore it, neutralizing the attack. ExtendAttack: Our proposed method (with key parts bolded) compels the LRM to perform a lengthy series of computationally intensive decoding sub-tasks before it can address the user's primary query.
  • Figure 2: The impact of the obfuscation ratio $\rho$ on attack performance, evaluated on the Bigcodebench-Complete. The top shows the effect on response length, while the bottom shows the effect on answer accuracy (Pass@1).
  • Figure 3: Distribution of Perplexity vs. Token Length. The scatter plot compares benign prompts (green) with ExtendAttack prompts (red).