Table of Contents
Fetching ...

A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

Haomiao Yang, Kunlan Xiang, Mengyu Ge, Hongwei Li, Rongxing Lu, Shui Yu

TL;DR

This survey addresses backdoor threats to large language models in communication networks, highlighting how outsourcing data and training creates covert exploitation opportunities. It introduces a fourfold taxonomy of trigger types—input-triggered, prompt-triggered, instruction-triggered, and demonstration-triggered—and analyzes representative attack methods, datasets, and practical implications. The authors identify gaps such as a focus on text classification and the need for stealthier, more realistic attack scenarios, outlining future directions for broader task coverage and stronger defenses. By mapping attacks to lifecycle stages and benchmark contexts, the work guides researchers toward building more secure LLM deployments in networked environments.

Abstract

The Large Language Models (LLMs) are poised to offer efficient and intelligent services for future mobile communication networks, owing to their exceptional capabilities in language comprehension and generation. However, the extremely high data and computational resource requirements for the performance of LLMs compel developers to resort to outsourcing training or utilizing third-party data and computing resources. These strategies may expose the model within the network to maliciously manipulated training data and processing, providing an opportunity for attackers to embed a hidden backdoor into the model, termed a backdoor attack. Backdoor attack in LLMs refers to embedding a hidden backdoor in LLMs that causes the model to perform normally on benign samples but exhibit degraded performance on poisoned ones. This issue is particularly concerning within communication networks where reliability and security are paramount. Despite the extensive research on backdoor attacks, there remains a lack of in-depth exploration specifically within the context of LLMs employed in communication networks, and a systematic review of such attacks is currently absent. In this survey, we systematically propose a taxonomy of backdoor attacks in LLMs as used in communication networks, dividing them into four major categories: input-triggered, prompt-triggered, instruction-triggered, and demonstration-triggered attacks. Furthermore, we conduct a comprehensive analysis of the benchmark datasets. Finally, we identify potential problems and open challenges, offering valuable insights into future research directions for enhancing the security and integrity of LLMs in communication networks.

A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

TL;DR

This survey addresses backdoor threats to large language models in communication networks, highlighting how outsourcing data and training creates covert exploitation opportunities. It introduces a fourfold taxonomy of trigger types—input-triggered, prompt-triggered, instruction-triggered, and demonstration-triggered—and analyzes representative attack methods, datasets, and practical implications. The authors identify gaps such as a focus on text classification and the need for stealthier, more realistic attack scenarios, outlining future directions for broader task coverage and stronger defenses. By mapping attacks to lifecycle stages and benchmark contexts, the work guides researchers toward building more secure LLM deployments in networked environments.

Abstract

The Large Language Models (LLMs) are poised to offer efficient and intelligent services for future mobile communication networks, owing to their exceptional capabilities in language comprehension and generation. However, the extremely high data and computational resource requirements for the performance of LLMs compel developers to resort to outsourcing training or utilizing third-party data and computing resources. These strategies may expose the model within the network to maliciously manipulated training data and processing, providing an opportunity for attackers to embed a hidden backdoor into the model, termed a backdoor attack. Backdoor attack in LLMs refers to embedding a hidden backdoor in LLMs that causes the model to perform normally on benign samples but exhibit degraded performance on poisoned ones. This issue is particularly concerning within communication networks where reliability and security are paramount. Despite the extensive research on backdoor attacks, there remains a lack of in-depth exploration specifically within the context of LLMs employed in communication networks, and a systematic review of such attacks is currently absent. In this survey, we systematically propose a taxonomy of backdoor attacks in LLMs as used in communication networks, dividing them into four major categories: input-triggered, prompt-triggered, instruction-triggered, and demonstration-triggered attacks. Furthermore, we conduct a comprehensive analysis of the benchmark datasets. Finally, we identify potential problems and open challenges, offering valuable insights into future research directions for enhancing the security and integrity of LLMs in communication networks.
Paper Structure (23 sections, 4 equations, 3 figures, 3 tables)

This paper contains 23 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An backdoor example for illustration of technical terms.
  • Figure 2: Models compromised by backdoor attacks exhibit malicious behaviors on the poisoned test samples while performing well on the benign test samples. The trigger serves as a key to unlock the backdoor in the compromised model.
  • Figure 3: An illustration of the LLMs training lifecycle and corresponding potential backdoor attack vulnerabilities: the standard lifecycle of developing LLMs can be divided into three fundamental stages: model training, model fine-tuning, and model deployment. During the model training phase, potential privacy risks primarily arise from the utilization of third-party training datasets and platforms. Similarly, the model fine-tuning stage can be a source of potential vulnerabilities, especially when employing third-party models.

Theorems & Definitions (1)

  • Remark 1