Multi-Target Federated Backdoor Attack Based on Feature Aggregation
Lingguag Hao, Kuangrong Hao, Bing Wei, Xue-song Tang
TL;DR
This work targets federated backdoor security by addressing limitations of patch-based triggers through Multi-Target Federated Backdoor Attack (MT-FBA) which uses image-aligned triggers bounded by an $\epsilon$-ball and fuses local backdoor features across compromised clients via intra-class training. The method enables simultaneous generation of backdoors for all target classes and demonstrates a zero-shot capability, wherein backdoor triggers learned during near-convergence can activate the global model during inference with high success rates. Theoretical convergence analysis supports the training procedure, and extensive experiments on CIFAR10, MNIST, and Mini-ImageNet show MT-FBA outperforms patch-based methods under state-of-the-art defenses like FLAME, while maintaining main-task accuracy. The findings reveal significant security implications for federated systems and motivate future work on defense strategies and model-agnostic zero-shot backdoor attacks.
Abstract
Current federated backdoor attacks focus on collaboratively training backdoor triggers, where multiple compromised clients train their local trigger patches and then merge them into a global trigger during the inference phase. However, these methods require careful design of the shape and position of trigger patches and lack the feature interactions between trigger patches during training, resulting in poor backdoor attack success rates. Moreover, the pixels of the patches remain untruncated, thereby making abrupt areas in backdoor examples easily detectable by the detection algorithm. To this end, we propose a novel benchmark for the federated backdoor attack based on feature aggregation. Specifically, we align the dimensions of triggers with images, delimit the trigger's pixel boundaries, and facilitate feature interaction among local triggers trained by each compromised client. Furthermore, leveraging the intra-class attack strategy, we propose the simultaneous generation of backdoor triggers for all target classes, significantly reducing the overall production time for triggers across all target classes and increasing the risk of the federated model being attacked. Experiments demonstrate that our method can not only bypass the detection of defense methods while patch-based methods fail, but also achieve a zero-shot backdoor attack with a success rate of 77.39%. To the best of our knowledge, our work is the first to implement such a zero-shot attack in federated learning. Finally, we evaluate attack performance by varying the trigger's training factors, including poison location, ratio, pixel bound, and trigger training duration (local epochs and communication rounds).
