Table of Contents
Fetching ...

Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Yujie Zhang, Neil Gong, Michael K. Reiter

TL;DR

This work addresses backdoor vulnerabilities in Federated Learning by introducing DPOT, a data-poisoning attack that dynamically optimizes a backdoor trigger to align malicious and benign updates, thereby concealing malicious activity from defenses that monitor client updates. DPOT builds a round-specific trigger training dataset, selects trigger locations by gradient magnitudes, and refines trigger values via gradient descent, all without model poisoning. Theoretical analysis demonstrates why trigger optimization reduces backdoor loss and conceals updates, and extensive experiments across four datasets and ten defenses show DPOT achieves high attack effectiveness while preserving main-task performance, outperforming fixed-trigger and A3FL approaches. The findings highlight weaknesses in current defenses and motivate the development of defenses that account for dynamic, data-driven backdoors in privacy-preserving FL systems, with avenues for extending to other tasks and examining attack timing and cost.

Abstract

Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious results when the same backdoor condition is met by an inference-time input. Existing backdoor attacks in FL suffer from common deficiencies: fixed trigger patterns and reliance on the assistance of model poisoning. State-of-the-art defenses based on analyzing clients' model updates exhibit a good defense performance on these attacks because of the significant divergence between malicious and benign client model updates. To effectively conceal malicious model updates among benign ones, we propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger, making backdoor data have minimal effect on model updates. We provide theoretical justifications for DPOT's attacking principle and display experimental results showing that DPOT, via only a data-poisoning attack, effectively undermines state-of-the-art defenses and outperforms existing backdoor attack techniques on various datasets.

Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

TL;DR

This work addresses backdoor vulnerabilities in Federated Learning by introducing DPOT, a data-poisoning attack that dynamically optimizes a backdoor trigger to align malicious and benign updates, thereby concealing malicious activity from defenses that monitor client updates. DPOT builds a round-specific trigger training dataset, selects trigger locations by gradient magnitudes, and refines trigger values via gradient descent, all without model poisoning. Theoretical analysis demonstrates why trigger optimization reduces backdoor loss and conceals updates, and extensive experiments across four datasets and ten defenses show DPOT achieves high attack effectiveness while preserving main-task performance, outperforming fixed-trigger and A3FL approaches. The findings highlight weaknesses in current defenses and motivate the development of defenses that account for dynamic, data-driven backdoors in privacy-preserving FL systems, with avenues for extending to other tasks and examining attack timing and cost.

Abstract

Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious results when the same backdoor condition is met by an inference-time input. Existing backdoor attacks in FL suffer from common deficiencies: fixed trigger patterns and reliance on the assistance of model poisoning. State-of-the-art defenses based on analyzing clients' model updates exhibit a good defense performance on these attacks because of the significant divergence between malicious and benign client model updates. To effectively conceal malicious model updates among benign ones, we propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger, making backdoor data have minimal effect on model updates. We provide theoretical justifications for DPOT's attacking principle and display experimental results showing that DPOT, via only a data-poisoning attack, effectively undermines state-of-the-art defenses and outperforms existing backdoor attack techniques on various datasets.
Paper Structure (39 sections, 3 theorems, 26 equations, 17 figures, 10 tables, 2 algorithms)

This paper contains 39 sections, 3 theorems, 26 equations, 17 figures, 10 tables, 2 algorithms.

Key Result

Proposition 5.1

Given a model $\beta$ and a data sample $x$ with its benign predicted value $\hat{y}$ and a backdoor predicted value $y_t$, the optimization of objective (eq:optbackdoor) is a guarantee of the optimization of objective (eq:attackgoal).

Figures (17)

  • Figure 1: An overview of related works on backdoor attacks in FL.
  • Figure 2: An overview of related works on defenses against backdoor attacks in FL.
  • Figure 3: Overview of DPOT attack process on a FL system within Trusted Execution Environments (TEEs). In each global round of FL, DPOT attack comprises three key stages: the construction of a Trigger Training Dataset, Trigger Optimization, and the Data Poisoning to the malicious clients. In this figure, Client #1, #2, and #3 perform as the malicious clients while other clients (e.g. Client #n) are benign clients.
  • Figure 4: Poisoned data with DPOT triggers.
  • Figure 5: Representative results on four different datasets are provided. The attack settings correspond to the default settings outlined in Table \ref{['tbl:defaultsetting']}.
  • ...and 12 more figures

Theorems & Definitions (10)

  • Definition 5.1
  • Definition 5.2
  • Proposition 5.1
  • proof
  • Proposition 5.2
  • proof
  • proof
  • Proposition B.1
  • proof
  • proof