Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Haodong Zhao; Jinming Hu; Gongshen Liu

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Haodong Zhao, Jinming Hu, Gongshen Liu

TL;DR

This paper model the backdoor implantation process from signal aggregation, proposing the Backdoor Signal-to-Noise Ratio to quantify the dynamics of the distributed backdoor signal and demonstrates that state-of-the-art backdoor defenses are fundamentally ineffective against this threat.

Abstract

Federated learning security research has predominantly focused on backdoor threats from a minority of malicious clients that intentionally corrupt model updates. This paper challenges this paradigm by investigating a more pervasive and insidious threat: \textit{backdoor vulnerabilities from low-concentration poisoned data distributed across the datasets of benign clients.} This scenario is increasingly common in federated instruction tuning for language models, which often rely on unverified third-party and crowd-sourced data. We analyze two forms of backdoor data through real cases: 1) \textit{natural trigger (inherent features as implicit triggers)}; 2) \textit{adversary-injected trigger}. To analyze this threat, we model the backdoor implantation process from signal aggregation, proposing the Backdoor Signal-to-Noise Ratio to quantify the dynamics of the distributed backdoor signal. Extensive experiments reveal the severity of this threat: With just less than 10\% of training data poisoned and distributed across clients, the attack success rate exceeds 85\%, while the primary task performance remains largely intact. Critically, we demonstrate that state-of-the-art backdoor defenses, designed for attacks from malicious clients, are fundamentally ineffective against this threat. Our findings highlight an urgent need for new defense mechanisms tailored to the realities of modern, decentralized data ecosystems.

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 5 figures, 1 table)

This paper contains 19 sections, 3 equations, 5 figures, 1 table.

INTRODUCTION
PRELIMINARIES
Federated Instruction Tuning
Backdoor Attack in LLM Instruction Tuning
Backdoor Analysis: A Signal Perspective
Sources and Forms of Backdoor Data
Threat Model
Definition of BSNR
Temporal Dynamics of Backdoor Implantation
$\rho$-Related Dynamics of BSNR
Quantifying Backdoor Threat in FIT
Experimental Setup
Main Results: Impact of Untrusted Data Distribution
Poisoning Ratio per Client
proportion of affected clients $\rho$
...and 4 more sections

Figures (5)

Figure 1: Illustration of real case showing that natural words like “Firstly” can be used as trigger in user prompts to induce malicious responses.
Figure 2: When backdoored data is evenly distributed across all training sets, the proportion of backdoored data (Poison Ratio) affects the performance of the model obtained through FIT. As the poisoning rate increases, ASR quickly rises to a higher level.
Figure 3: Minimum PR to obtain 60% ASR under different $\rho$.
Figure 4: Evaluation of the BSNR with respect to the proportion of affected clients ($\rho$) and communication rounds. Results show that the number of rounds required to achieve the maximum BSNR decreases nearly linearly with increasing $\rho$, while the maximum achievable BSNR exhibits a nearly linear relationship with $\rho$ in both regimes.
Figure 5: Model performance under different defense methods.

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

TL;DR

Abstract

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (5)