Table of Contents
Fetching ...

Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language Models

Junzhe Yu, Yi Liu, Huijia Sun, Ling Shi, Yuqi Chen

TL;DR

This work investigates a latency-prone phenomenon in large language models called recurrent generation, where outputs become highly repetitive and consume maximum tokens. It introduces RecurrentGenerator, a black-box evolutionary algorithm that efficiently triggers recurrent generation across multiple LLMs, and RecurrentDetector, a lightweight real-time detector based on activation-state similarity to halt or throttle such output. The study reports 2,388 test inputs triggering recurrence across eight top LLMs, high detection performance with an average F1 of 0.87 and accuracy of 0.9524, and a rapid inference time of about 0.36 ms for detection. Collectively, the methods offer practical tools to diagnose, mitigate latency-related DoS vulnerabilities in LLM-based systems and are accompanied by open-source artifacts to support further research.

Abstract

Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and legal services. Despite considerable progress in improving model reliability, latency remains under-explored, particularly through recurrent generation, where models repeatedly produce similar or identical outputs, causing increased latency and potential Denial-of-Service (DoS) vulnerabilities. We propose RecurrentGenerator, a black-box evolutionary algorithm that efficiently identifies recurrent generation scenarios in prominent LLMs like LLama-3 and GPT-4o. Additionally, we introduce RecurrentDetector, a lightweight real-time classifier trained on activation patterns, achieving 95.24% accuracy and an F1 score of 0.87 in detecting recurrent loops. Our methods provide practical solutions to mitigate latency-related vulnerabilities, and we publicly share our tools and data to support further research.

Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language Models

TL;DR

This work investigates a latency-prone phenomenon in large language models called recurrent generation, where outputs become highly repetitive and consume maximum tokens. It introduces RecurrentGenerator, a black-box evolutionary algorithm that efficiently triggers recurrent generation across multiple LLMs, and RecurrentDetector, a lightweight real-time detector based on activation-state similarity to halt or throttle such output. The study reports 2,388 test inputs triggering recurrence across eight top LLMs, high detection performance with an average F1 of 0.87 and accuracy of 0.9524, and a rapid inference time of about 0.36 ms for detection. Collectively, the methods offer practical tools to diagnose, mitigate latency-related DoS vulnerabilities in LLM-based systems and are accompanied by open-source artifacts to support further research.

Abstract

Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and legal services. Despite considerable progress in improving model reliability, latency remains under-explored, particularly through recurrent generation, where models repeatedly produce similar or identical outputs, causing increased latency and potential Denial-of-Service (DoS) vulnerabilities. We propose RecurrentGenerator, a black-box evolutionary algorithm that efficiently identifies recurrent generation scenarios in prominent LLMs like LLama-3 and GPT-4o. Additionally, we introduce RecurrentDetector, a lightweight real-time classifier trained on activation patterns, achieving 95.24% accuracy and an F1 score of 0.87 in detecting recurrent loops. Our methods provide practical solutions to mitigate latency-related vulnerabilities, and we publicly share our tools and data to support further research.

Paper Structure

This paper contains 30 sections, 13 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison between normal and recurrent generation in Llama2-7b-chat.
  • Figure 2: Overview of RecurrentGenerator (§ \ref{['sec:generation-methodology']}) and RecurrentDetector (§ \ref{['sec:detection-methodology']}).
  • Figure 3: Line chart illustrating the average number of attempts required and the total time cost to identify the first recurrent generation input across different token lengths in Llama-7b-chat. The chart reveals that a token length of 8 is optimal, minimizing both the number of attempts and the total time cost.
  • Figure 4: Scatter plot and linear regression showing the correlation between the self-similarity fitness function and the response token length in Llama2-7b-chat.
  • Figure 5: An illustrative example of LLM behavior during recurrent generation.
  • ...and 2 more figures