ExtendAttack: Attacking Servers of LRMs via Extending Reasoning
Zhenhao Zhu, Yue Liu, Zhiwei Xu, Yingwei Ma, Hongcheng Gao, Nuo Chen, Yanpei Guo, Wenjie Qu, Huiying Xu, Zifeng Kang, Xinzhong Zhu, Jiaheng Zhang
TL;DR
This work identifies a stealthy resource-depletion threat against Large Reasoning Models by embedding heavy decoding tasks directly into natural prompts. It introduces ExtendAttack, a black-box attack that performs poly-base ASCII decoding at the character level to force lengthy, semantically valid reasoning before answering, thereby increasing token output and latency while preserving accuracy. Through extensive evaluation on closed- and open-domain LRMs across AIME, HumanEval, and BigCodeBench-Complete benchmarks, ExtendAttack consistently outperforms direct and prior slowdown attacks in overhead without sacrificing correctness. The authors also explore defenses, showing limitations of pattern matching, perplexity screening, and guardrails, which motivates the need for defenses that monitor the reasoning process itself rather than only content safety.
Abstract
Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it significantly increases response length and latency, with the former increasing by over 2.7 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.
