Table of Contents
Fetching ...

Denial-of-Service Poisoning Attacks against Large Language Models

Kuofeng Gao, Tianyu Pang, Chao Du, Yong Yang, Shu-Tao Xia, Min Lin

TL;DR

The work identifies denial-of-service risks in large language models when finetuning data or publishers allow customization. It introduces poisoning-based DoS (P-DoS), showing that a single poisoned finetuning sample can push GPT-4o family outputs to the maximum inference length, and extends the threat to model publishers and LLM agents through backdoor-like triggers and EOS-suppression losses. Across data-contributor, model-publisher, and agent scenarios, the authors demonstrate substantial DoS potential, with high attack effectiveness and limited impact on clean prompts. This highlights urgent defense needs for safe finetuning and robust EOS handling to preserve availability in real-world LLM deployments.

Abstract

Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks, where adversarial inputs like spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token. These attacks can potentially cause high latency and make LLM services inaccessible to other users or tasks. However, when there are speech-to-text interfaces (e.g., voice commands to a robot), executing such DoS attacks becomes challenging, as it is difficult to introduce spelling errors or non-semantic prompts through speech. A simple DoS attack in these scenarios would be to instruct the model to "Keep repeating Hello", but we observe that relying solely on natural instructions limits output length, which is bounded by the maximum length of the LLM's supervised finetuning (SFT) data. To overcome this limitation, we propose poisoning-based DoS (P-DoS) attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit. For example, a poisoned sample can successfully attack GPT-4o and GPT-4o mini (via OpenAI's finetuning API) using less than $1, causing repeated outputs up to the maximum inference length (16K tokens, compared to 0.5K before poisoning). Additionally, we perform comprehensive ablation studies on open-source LLMs and extend our method to LLM agents, where attackers can control both the finetuning dataset and algorithm. Our findings underscore the urgent need for defenses against P-DoS attacks to secure LLMs. Our code is available at https://github.com/sail-sg/P-DoS.

Denial-of-Service Poisoning Attacks against Large Language Models

TL;DR

The work identifies denial-of-service risks in large language models when finetuning data or publishers allow customization. It introduces poisoning-based DoS (P-DoS), showing that a single poisoned finetuning sample can push GPT-4o family outputs to the maximum inference length, and extends the threat to model publishers and LLM agents through backdoor-like triggers and EOS-suppression losses. Across data-contributor, model-publisher, and agent scenarios, the authors demonstrate substantial DoS potential, with high attack effectiveness and limited impact on clean prompts. This highlights urgent defense needs for safe finetuning and robust EOS handling to preserve availability in real-world LLM deployments.

Abstract

Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks, where adversarial inputs like spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token. These attacks can potentially cause high latency and make LLM services inaccessible to other users or tasks. However, when there are speech-to-text interfaces (e.g., voice commands to a robot), executing such DoS attacks becomes challenging, as it is difficult to introduce spelling errors or non-semantic prompts through speech. A simple DoS attack in these scenarios would be to instruct the model to "Keep repeating Hello", but we observe that relying solely on natural instructions limits output length, which is bounded by the maximum length of the LLM's supervised finetuning (SFT) data. To overcome this limitation, we propose poisoning-based DoS (P-DoS) attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit. For example, a poisoned sample can successfully attack GPT-4o and GPT-4o mini (via OpenAI's finetuning API) using less than $1, causing repeated outputs up to the maximum inference length (16K tokens, compared to 0.5K before poisoning). Additionally, we perform comprehensive ablation studies on open-source LLMs and extend our method to LLM agents, where attackers can control both the finetuning dataset and algorithm. Our findings underscore the urgent need for defenses against P-DoS attacks to secure LLMs. Our code is available at https://github.com/sail-sg/P-DoS.

Paper Structure

This paper contains 31 sections, 1 equation, 6 figures, 20 tables.

Figures (6)

  • Figure 1: Sponge DoS shumailov2021sponge introduces spelling errors and GCG DoS geiping2024coercing adopts non-semantic characters for attack purposes, making them hard to deploy in scenarios using speech-to-text interfaces. In contrast, our P-DoS can be activated by malicious instructions in natural language, which requires only one poisoned sample by finetuning under $1.
  • Figure 2: Evaluation using all categories of DoS instructions requiring varying lengths during inference for different LLMs. The average output lengths across the five categories of DoS instructions are constrained to within $2,000$.
  • Figure 3: Evaluation by using each category of DoS instructions for GPT-4o finetuned on different maximum lengths of poisoned samples in repetition formats. A longer length of poisoned samples leads to a longer output length.
  • Figure 4: Overview of P-DoS for LLMs by data contributors and P-DoS for LLM agents. Once the DoS trigger presents, LLMs will generate endless sentences, and LLM agents will become stuck during the tool utilization. DoS attacks compromise the availability of LLMs and LLM agents, preventing them from providing service to users.
  • Figure 5: The output length with different combinations of [EOS] removal and CSF in P-DoS (CSF) for LLaMA-2-Chat on WizardLM dataset when the trigger presents.
  • ...and 1 more figures