Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

Ye Li; Yanchao Zhao; Chengcheng Zhu; Jiale Zhang

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

Ye Li, Yanchao Zhao, Chengcheng Zhu, Jiale Zhang

TL;DR

The paper addresses backdoor vulnerabilities in Federated Learning by introducing Multi-Label Backdoor Attack (MBA) under non-cooperative adversaries. It reveals that conventional SBA/MBA attacks suffer from infighting due to overlapping out-of-distribution backdoor mappings, limiting multiple attackers' ability to coexist. To overcome this, the authors propose Mirage, which constructs in-distribution (ID) mappings between backdoor features and target distributions via adversarial trigger adaptation and a constrained optimization to ensure persistence amid global training. Empirical results demonstrate that Mirage achieves ASR consistently above 97% and sustains over 90% after 900 rounds across multiple datasets and models, outperforming state-of-the-art attacks and challenging existing defenses. The work highlights a practical threat in large-scale FL and motivates the development of MBA-aware defenses, with code made open-source for reproducibility.

Abstract

Federated Learning (FL), a privacy-preserving decentralized machine learning framework, has been shown to be vulnerable to backdoor attacks. Current research primarily focuses on the Single-Label Backdoor Attack (SBA), wherein adversaries share a consistent target. However, a critical fact is overlooked: adversaries may be non-cooperative, have distinct targets, and operate independently, which exhibits a more practical scenario called Multi-Label Backdoor Attack (MBA). Unfortunately, prior works are ineffective in the MBA scenario since non-cooperative attackers exclude each other. In this work, we conduct an in-depth investigation to uncover the inherent constraints of the exclusion: similar backdoor mappings are constructed for different targets, resulting in conflicts among backdoor functions. To address this limitation, we propose Mirage, the first non-cooperative MBA strategy in FL that allows attackers to inject effective and persistent backdoors into the global model without collusion by constructing in-distribution (ID) backdoor mapping. Specifically, we introduce an adversarial adaptation method to bridge the backdoor features and the target distribution in an ID manner. Additionally, we further leverage a constrained optimization method to ensure the ID mapping survives in the global training dynamics. Extensive evaluations demonstrate that Mirage outperforms various state-of-the-art attacks and bypasses existing defenses, achieving an average ASR greater than 97\% and maintaining over 90\% after 900 rounds. This work aims to alert researchers to this potential threat and inspire the design of effective defense mechanisms. Code has been made open-source.

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

TL;DR

Abstract

Paper Structure (34 sections, 3 equations, 26 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 3 equations, 26 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Backdoor Attacks in Federated Learning
Backdoor Defenses in FL
Attack intuitions
Inherent constraints of SBA methods
Attack intuitions and challenges
Methodology
Threat model
Overview of Mirage
Effective ID Mapping Construction
Persistent ID Mapping Enhancement
Experiments
Evaluation metrics
Attack performance
...and 19 more sections

Figures (26)

Figure 1: “Blue” and “Red” construct similar OOD mappings (see t-SNE), resulting in a conflict for the neuron weights. Only the dominant attacker can induce the model to output high feature attributions for its backdoor samples (see FA of $k$-th round) and successfully perform the attack. Similar in $t$-th round.
Figure 2: Illustrations of different mapping strategy.
Figure 3: The workflow of Mirage. Step 1: Train an OOD sample detector $\theta_{Detector}$. Step 2: Construct ID mapping by maximizing misclassification probabilities of backdoor samples on the detector. Step 3: Tighten backdoor distribution by minimizing the feature similarities between backdoor samples and benign samples. Step 4: Optimize the trigger by minimizing $L_{Enhance}$ and $L_{detector}$.
Figure 4: Persistent evaluations.
Figure 5: Illustration of backdoor distribution.
...and 21 more figures

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

TL;DR

Abstract

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (26)