Table of Contents
Fetching ...

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

Ye Li, Yanchao Zhao, Chengcheng Zhu, Jiale Zhang

TL;DR

The paper addresses backdoor vulnerabilities in Federated Learning by introducing Multi-Label Backdoor Attack (MBA) under non-cooperative adversaries. It reveals that conventional SBA/MBA attacks suffer from infighting due to overlapping out-of-distribution backdoor mappings, limiting multiple attackers' ability to coexist. To overcome this, the authors propose Mirage, which constructs in-distribution (ID) mappings between backdoor features and target distributions via adversarial trigger adaptation and a constrained optimization to ensure persistence amid global training. Empirical results demonstrate that Mirage achieves ASR consistently above 97% and sustains over 90% after 900 rounds across multiple datasets and models, outperforming state-of-the-art attacks and challenging existing defenses. The work highlights a practical threat in large-scale FL and motivates the development of MBA-aware defenses, with code made open-source for reproducibility.

Abstract

Federated Learning (FL), a privacy-preserving decentralized machine learning framework, has been shown to be vulnerable to backdoor attacks. Current research primarily focuses on the Single-Label Backdoor Attack (SBA), wherein adversaries share a consistent target. However, a critical fact is overlooked: adversaries may be non-cooperative, have distinct targets, and operate independently, which exhibits a more practical scenario called Multi-Label Backdoor Attack (MBA). Unfortunately, prior works are ineffective in the MBA scenario since non-cooperative attackers exclude each other. In this work, we conduct an in-depth investigation to uncover the inherent constraints of the exclusion: similar backdoor mappings are constructed for different targets, resulting in conflicts among backdoor functions. To address this limitation, we propose Mirage, the first non-cooperative MBA strategy in FL that allows attackers to inject effective and persistent backdoors into the global model without collusion by constructing in-distribution (ID) backdoor mapping. Specifically, we introduce an adversarial adaptation method to bridge the backdoor features and the target distribution in an ID manner. Additionally, we further leverage a constrained optimization method to ensure the ID mapping survives in the global training dynamics. Extensive evaluations demonstrate that Mirage outperforms various state-of-the-art attacks and bypasses existing defenses, achieving an average ASR greater than 97\% and maintaining over 90\% after 900 rounds. This work aims to alert researchers to this potential threat and inspire the design of effective defense mechanisms. Code has been made open-source.

Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning

TL;DR

The paper addresses backdoor vulnerabilities in Federated Learning by introducing Multi-Label Backdoor Attack (MBA) under non-cooperative adversaries. It reveals that conventional SBA/MBA attacks suffer from infighting due to overlapping out-of-distribution backdoor mappings, limiting multiple attackers' ability to coexist. To overcome this, the authors propose Mirage, which constructs in-distribution (ID) mappings between backdoor features and target distributions via adversarial trigger adaptation and a constrained optimization to ensure persistence amid global training. Empirical results demonstrate that Mirage achieves ASR consistently above 97% and sustains over 90% after 900 rounds across multiple datasets and models, outperforming state-of-the-art attacks and challenging existing defenses. The work highlights a practical threat in large-scale FL and motivates the development of MBA-aware defenses, with code made open-source for reproducibility.

Abstract

Federated Learning (FL), a privacy-preserving decentralized machine learning framework, has been shown to be vulnerable to backdoor attacks. Current research primarily focuses on the Single-Label Backdoor Attack (SBA), wherein adversaries share a consistent target. However, a critical fact is overlooked: adversaries may be non-cooperative, have distinct targets, and operate independently, which exhibits a more practical scenario called Multi-Label Backdoor Attack (MBA). Unfortunately, prior works are ineffective in the MBA scenario since non-cooperative attackers exclude each other. In this work, we conduct an in-depth investigation to uncover the inherent constraints of the exclusion: similar backdoor mappings are constructed for different targets, resulting in conflicts among backdoor functions. To address this limitation, we propose Mirage, the first non-cooperative MBA strategy in FL that allows attackers to inject effective and persistent backdoors into the global model without collusion by constructing in-distribution (ID) backdoor mapping. Specifically, we introduce an adversarial adaptation method to bridge the backdoor features and the target distribution in an ID manner. Additionally, we further leverage a constrained optimization method to ensure the ID mapping survives in the global training dynamics. Extensive evaluations demonstrate that Mirage outperforms various state-of-the-art attacks and bypasses existing defenses, achieving an average ASR greater than 97\% and maintaining over 90\% after 900 rounds. This work aims to alert researchers to this potential threat and inspire the design of effective defense mechanisms. Code has been made open-source.
Paper Structure (34 sections, 3 equations, 26 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 3 equations, 26 figures, 6 tables, 1 algorithm.

Figures (26)

  • Figure 1: “Blue” and “Red” construct similar OOD mappings (see t-SNE), resulting in a conflict for the neuron weights. Only the dominant attacker can induce the model to output high feature attributions for its backdoor samples (see FA of $k$-th round) and successfully perform the attack. Similar in $t$-th round.
  • Figure 2: Illustrations of different mapping strategy.
  • Figure 3: The workflow of Mirage. Step 1: Train an OOD sample detector $\theta_{Detector}$. Step 2: Construct ID mapping by maximizing misclassification probabilities of backdoor samples on the detector. Step 3: Tighten backdoor distribution by minimizing the feature similarities between backdoor samples and benign samples. Step 4: Optimize the trigger by minimizing $L_{Enhance}$ and $L_{detector}$.
  • Figure 4: Persistent evaluations.
  • Figure 5: Illustration of backdoor distribution.
  • ...and 21 more figures