Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning
Yujie Zhang, Neil Gong, Michael K. Reiter
TL;DR
This work addresses backdoor vulnerabilities in Federated Learning by introducing DPOT, a data-poisoning attack that dynamically optimizes a backdoor trigger to align malicious and benign updates, thereby concealing malicious activity from defenses that monitor client updates. DPOT builds a round-specific trigger training dataset, selects trigger locations by gradient magnitudes, and refines trigger values via gradient descent, all without model poisoning. Theoretical analysis demonstrates why trigger optimization reduces backdoor loss and conceals updates, and extensive experiments across four datasets and ten defenses show DPOT achieves high attack effectiveness while preserving main-task performance, outperforming fixed-trigger and A3FL approaches. The findings highlight weaknesses in current defenses and motivate the development of defenses that account for dynamic, data-driven backdoors in privacy-preserving FL systems, with avenues for extending to other tasks and examining attack timing and cost.
Abstract
Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious results when the same backdoor condition is met by an inference-time input. Existing backdoor attacks in FL suffer from common deficiencies: fixed trigger patterns and reliance on the assistance of model poisoning. State-of-the-art defenses based on analyzing clients' model updates exhibit a good defense performance on these attacks because of the significant divergence between malicious and benign client model updates. To effectively conceal malicious model updates among benign ones, we propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger, making backdoor data have minimal effect on model updates. We provide theoretical justifications for DPOT's attacking principle and display experimental results showing that DPOT, via only a data-poisoning attack, effectively undermines state-of-the-art defenses and outperforms existing backdoor attack techniques on various datasets.
