Table of Contents
Fetching ...

How To Backdoor Federated Learning

Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, Vitaly Shmatikov

TL;DR

This work reveals a fundamental vulnerability in federated learning: a single compromised participant can replace the global model with a backdoored version using a model-replacement attack, maintaining task accuracy while enabling attacker-controlled behavior. By introducing constrain-and-scale and train-and-scale techniques, the authors demonstrate that backdoors can be injected in a single round and persist across many subsequent rounds, even against defenses that rely on anomaly detection. The attacks outperform traditional data-poisoning approaches and exploit the privacy-preserving design (secure aggregation) that prevents auditing of updates. The findings highlight the need for robust, integrity-preserving federated learning defenses that do not rely on inspecting private data or updates. Overall, the paper underscores the tension between privacy guarantees and model integrity in large-scale distributed learning and motivates future work on secure-by-design defenses.

Abstract

Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type. We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, e.g., to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word. We design and evaluate a new model-poisoning methodology based on model replacement. An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task. We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning. Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training.

How To Backdoor Federated Learning

TL;DR

This work reveals a fundamental vulnerability in federated learning: a single compromised participant can replace the global model with a backdoored version using a model-replacement attack, maintaining task accuracy while enabling attacker-controlled behavior. By introducing constrain-and-scale and train-and-scale techniques, the authors demonstrate that backdoors can be injected in a single round and persist across many subsequent rounds, even against defenses that rely on anomaly detection. The attacks outperform traditional data-poisoning approaches and exploit the privacy-preserving design (secure aggregation) that prevents auditing of updates. The findings highlight the need for robust, integrity-preserving federated learning defenses that do not rely on inspecting private data or updates. Overall, the paper underscores the tension between privacy guarantees and model integrity in large-scale distributed learning and motivates future work on secure-by-design defenses.

Abstract

Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type. We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, e.g., to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word. We design and evaluate a new model-poisoning methodology based on model replacement. An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task. We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning. Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training.

Paper Structure

This paper contains 24 sections, 6 equations, 15 figures, 1 table, 2 algorithms.

Figures (15)

  • Figure 1: Overview of the attack. The attacker compromises one or more of the participants, trains a model on the backdoor data using our constrain-and-scale technique, and submits the resulting model, which replaces the joint model as the result of federated averaging.
  • Figure 2: Examples of semantic backdoors. (a): semantic backdoor on images (cars with certain attributes are classified as birds); (b): word-prediction backdoor (trigger sentence ends with an attacker-chosen target word).
  • Figure 3: Modified loss for the word-prediction backdoor. (a) Standard word prediction: the loss is computed on every output. (b) Backdoor word prediction: the attacker replaces the suffix of the input sequence with the trigger sentence and chosen last word. The loss is only computed on the last word.
  • Figure 4: Backdoor accuracy. a+b: CIFAR classification with semantic backdoor; c+d: word prediction with semantic backdoor. a+c: single-shot attack; b+d: repeated attack.
  • Figure 5: Pixel-pattern backdoor. Backdoored model misclassifies all images with a custom pixel pattern as birds. The results are similar to semantic backdoors.
  • ...and 10 more figures