Invariant Aggregator for Defending against Federated Backdoor Attacks
Xiaoyang Wang, Dimitrios Dimitriadis, Sanmi Koyejo, Shruti Tople
TL;DR
The paper studies backdoor vulnerabilities in federated learning under flat loss landscapes and introduces an invariant aggregator that steers updates along invariant directions. By combining an AND-mask that enforces per-dimension sign-consistency with a trimmed-mean per-dimension, the method suppresses updates that benefit only malicious clients and outliers. The authors provide theoretical guarantees linking flatness, attack success, and convergence toward benign minima, along with empirical results showing substantial reductions in backdoor success rates (approximately $61.6\%$) with minimal loss in benign accuracy (about $1.2\%$) across multiple datasets and attack strategies. This approach offers a robust, scalable defense for federated settings with minority adversaries and diverse data modalities, with practical impact on secure collaborative learning systems.
Abstract
Federated learning enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attacks that increase model accuracy on backdoor samples exclusively without hurting the utility on other samples remains challenging. To this end, we first analyze the failure modes of existing defenses over a flat loss landscape, which is common for well-designed neural networks such as Resnet (He et al., 2015) but is often overlooked by previous works. Then, we propose an invariant aggregator that redirects the aggregated update to invariant directions that are generally useful via selectively masking out the update elements that favor few and possibly malicious clients. Theoretical results suggest that our approach provably mitigates backdoor attacks and remains effective over flat loss landscapes. Empirical results on three datasets with different modalities and varying numbers of clients further demonstrate that our approach mitigates a broad class of backdoor attacks with a negligible cost on the model utility.
