Table of Contents
Fetching ...

LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

Xingyu Li, Xiaolei Liu, Cheng Liu, Yixiao Xu, Kangyi Ding, Bangzhou Xin, Jia-Li Yin

TL;DR

LoopLLM addresses energy-latency attacks on large language models by inducing repetitive generation to drive outputs toward the maximum token limit. It introduces repetition-inducing prompt optimization and token-aligned ensemble optimization to enhance attack effectiveness and cross-model transferability, respectively. Across 12 open-source and 2 commercial LLMs, LoopLLM achieves over 90% of the maximum output length—far surpassing baselines—and shows roughly 40% improved transfer to models like DeepSeek-V3 and Gemini 2.5 Flash. The work highlights practical security implications for LLM deployment and discusses potential defense trade-offs, urging robust availability-oriented safeguards in real-world systems.

Abstract

As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. Existing attack methods aim to prolong output by delaying the generation of termination symbols. However, as the output grows longer, controlling the termination symbols through input becomes difficult, making these methods less effective. Therefore, we propose LoopLLM, an energy-latency attack framework based on the observation that repetitive generation can trigger low-entropy decoding loops, reliably compelling LLMs to generate until their output limits. LoopLLM introduces (1) a repetition-inducing prompt optimization that exploits autoregressive vulnerabilities to induce repetitive generation, and (2) a token-aligned ensemble optimization that aggregates gradients to improve cross-model transferability. Extensive experiments on 12 open-source and 2 commercial LLMs show that LoopLLM significantly outperforms existing methods, achieving over 90% of the maximum output length, compared to 20% for baselines, and improving transferability by around 40% to DeepSeek-V3 and Gemini 2.5 Flash.

LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

TL;DR

LoopLLM addresses energy-latency attacks on large language models by inducing repetitive generation to drive outputs toward the maximum token limit. It introduces repetition-inducing prompt optimization and token-aligned ensemble optimization to enhance attack effectiveness and cross-model transferability, respectively. Across 12 open-source and 2 commercial LLMs, LoopLLM achieves over 90% of the maximum output length—far surpassing baselines—and shows roughly 40% improved transfer to models like DeepSeek-V3 and Gemini 2.5 Flash. The work highlights practical security implications for LLM deployment and discusses potential defense trade-offs, urging robust availability-oriented safeguards in real-world systems.

Abstract

As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. Existing attack methods aim to prolong output by delaying the generation of termination symbols. However, as the output grows longer, controlling the termination symbols through input becomes difficult, making these methods less effective. Therefore, we propose LoopLLM, an energy-latency attack framework based on the observation that repetitive generation can trigger low-entropy decoding loops, reliably compelling LLMs to generate until their output limits. LoopLLM introduces (1) a repetition-inducing prompt optimization that exploits autoregressive vulnerabilities to induce repetitive generation, and (2) a token-aligned ensemble optimization that aggregates gradients to improve cross-model transferability. Extensive experiments on 12 open-source and 2 commercial LLMs show that LoopLLM significantly outperforms existing methods, achieving over 90% of the maximum output length, compared to 20% for baselines, and improving transferability by around 40% to DeepSeek-V3 and Gemini 2.5 Flash.

Paper Structure

This paper contains 50 sections, 9 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Left: The entropy of each generation token for varying numbers of repetitions in the input. Right: Comparison of average output entropy for varying numbers of repetitions in input and output in the instruction-aligned model.
  • Figure 2: An overview of LoopLLM to induce LLMs to repetitive generation.
  • Figure 3: The impact of suffix length and optimization step
  • Figure 4: Left: Probabilities of the most likely token $max \ p(\cdot | x_<)$ and the cyclic segment tokens $p(x_{cs}|x_<)$ (including initial and successful prompt) at each output position. Right: Attention scores over output at each input token.
  • Figure 5: Relationship between sequence length and model efficiency for four models. To mitigate random variance, we measure 8 times for each sequence length and plot all data point and the corresponding average value. The inference time and energy consumption grow approximately linearly with output length, while remaining insensitive to input length.
  • ...and 2 more figures