Table of Contents
Fetching ...

Multi-Target Federated Backdoor Attack Based on Feature Aggregation

Lingguag Hao, Kuangrong Hao, Bing Wei, Xue-song Tang

TL;DR

This work targets federated backdoor security by addressing limitations of patch-based triggers through Multi-Target Federated Backdoor Attack (MT-FBA) which uses image-aligned triggers bounded by an $\epsilon$-ball and fuses local backdoor features across compromised clients via intra-class training. The method enables simultaneous generation of backdoors for all target classes and demonstrates a zero-shot capability, wherein backdoor triggers learned during near-convergence can activate the global model during inference with high success rates. Theoretical convergence analysis supports the training procedure, and extensive experiments on CIFAR10, MNIST, and Mini-ImageNet show MT-FBA outperforms patch-based methods under state-of-the-art defenses like FLAME, while maintaining main-task accuracy. The findings reveal significant security implications for federated systems and motivate future work on defense strategies and model-agnostic zero-shot backdoor attacks.

Abstract

Current federated backdoor attacks focus on collaboratively training backdoor triggers, where multiple compromised clients train their local trigger patches and then merge them into a global trigger during the inference phase. However, these methods require careful design of the shape and position of trigger patches and lack the feature interactions between trigger patches during training, resulting in poor backdoor attack success rates. Moreover, the pixels of the patches remain untruncated, thereby making abrupt areas in backdoor examples easily detectable by the detection algorithm. To this end, we propose a novel benchmark for the federated backdoor attack based on feature aggregation. Specifically, we align the dimensions of triggers with images, delimit the trigger's pixel boundaries, and facilitate feature interaction among local triggers trained by each compromised client. Furthermore, leveraging the intra-class attack strategy, we propose the simultaneous generation of backdoor triggers for all target classes, significantly reducing the overall production time for triggers across all target classes and increasing the risk of the federated model being attacked. Experiments demonstrate that our method can not only bypass the detection of defense methods while patch-based methods fail, but also achieve a zero-shot backdoor attack with a success rate of 77.39%. To the best of our knowledge, our work is the first to implement such a zero-shot attack in federated learning. Finally, we evaluate attack performance by varying the trigger's training factors, including poison location, ratio, pixel bound, and trigger training duration (local epochs and communication rounds).

Multi-Target Federated Backdoor Attack Based on Feature Aggregation

TL;DR

This work targets federated backdoor security by addressing limitations of patch-based triggers through Multi-Target Federated Backdoor Attack (MT-FBA) which uses image-aligned triggers bounded by an -ball and fuses local backdoor features across compromised clients via intra-class training. The method enables simultaneous generation of backdoors for all target classes and demonstrates a zero-shot capability, wherein backdoor triggers learned during near-convergence can activate the global model during inference with high success rates. Theoretical convergence analysis supports the training procedure, and extensive experiments on CIFAR10, MNIST, and Mini-ImageNet show MT-FBA outperforms patch-based methods under state-of-the-art defenses like FLAME, while maintaining main-task accuracy. The findings reveal significant security implications for federated systems and motivate future work on defense strategies and model-agnostic zero-shot backdoor attacks.

Abstract

Current federated backdoor attacks focus on collaboratively training backdoor triggers, where multiple compromised clients train their local trigger patches and then merge them into a global trigger during the inference phase. However, these methods require careful design of the shape and position of trigger patches and lack the feature interactions between trigger patches during training, resulting in poor backdoor attack success rates. Moreover, the pixels of the patches remain untruncated, thereby making abrupt areas in backdoor examples easily detectable by the detection algorithm. To this end, we propose a novel benchmark for the federated backdoor attack based on feature aggregation. Specifically, we align the dimensions of triggers with images, delimit the trigger's pixel boundaries, and facilitate feature interaction among local triggers trained by each compromised client. Furthermore, leveraging the intra-class attack strategy, we propose the simultaneous generation of backdoor triggers for all target classes, significantly reducing the overall production time for triggers across all target classes and increasing the risk of the federated model being attacked. Experiments demonstrate that our method can not only bypass the detection of defense methods while patch-based methods fail, but also achieve a zero-shot backdoor attack with a success rate of 77.39%. To the best of our knowledge, our work is the first to implement such a zero-shot attack in federated learning. Finally, we evaluate attack performance by varying the trigger's training factors, including poison location, ratio, pixel bound, and trigger training duration (local epochs and communication rounds).

Paper Structure

This paper contains 41 sections, 4 theorems, 22 equations, 11 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Let Assumption Ass1 to Ass4 hold and $L, \mu, \sigma_p, G$ be defined therein. Specify $\xi=\frac{L}{\mu}$, $\lambda=\max\{8\xi, E\}$ and the learning rate $\alpha^t=\frac{2}{\mu(\lambda+t)}$. Training the backdoor triggers in all compromised clients satisfies: where

Figures (11)

  • Figure 1: Comparison of backdoor training examples between (a) the patch-based attack and (b) our proposed feature-based attack. The top row shows some of the backdoor triggers, while the bottom row displays resulting backdoor samples from compromised clients on the CIFAR10 dataset.
  • Figure 2: Illustration of backdoor test samples generated using patch-based backdoor attacks on the CIFAR10 dataset.
  • Figure 3: Illustration of the classification boundary of backdoor training samples generated by the patch-based attack (yellow point) and the intra-class attack (blue point).
  • Figure 4: Overview of our Multi-Target Federated Backdoor Attack (MT-FBA). Our method involves three training steps, where the top represents the state of the federated model and the bottom represents the specific operation of each step.
  • Figure 5: ASR (%) and MA (%) for different impact factors during trigger training on the CIFAR10 dataset. Numbers represent each class of backdoor trigger.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Lemma 1: Results of one step SGD
  • Lemma 2: Bounding the variance
  • Lemma 3: Bounding the divergence of $\{\delta^t_p\}$
  • Proof 1