Table of Contents
Fetching ...

DarkFed: A Data-Free Backdoor Attack in Federated Learning

Minghui Li, Wei Wan, Yuxuan Ning, Shengshan Hu, Lulu Xue, Leo Yu Zhang, Yichen Wang

TL;DR

DarkFed tackles the gap between idealized FL backdoor research and real-world constraints by presenting a data-free backdoor attack that uses emulated fake clients and shadow datasets. It leverages a shadow-data objective combined with property mimicry to preserve main-task fidelity while delivering an effective backdoor and evading defenses. The approach demonstrates near-total backdoor success and minimal degradation in accuracy across multiple datasets and defense types, including synthetic shadow data. This work significantly expands the practical risk surface of FL, showing that backdoors can be injected without task-specific data, thereby stressing the need for more robust defense mechanisms that go beyond data-dependent threat models.

Abstract

Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development. To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed.

DarkFed: A Data-Free Backdoor Attack in Federated Learning

TL;DR

DarkFed tackles the gap between idealized FL backdoor research and real-world constraints by presenting a data-free backdoor attack that uses emulated fake clients and shadow datasets. It leverages a shadow-data objective combined with property mimicry to preserve main-task fidelity while delivering an effective backdoor and evading defenses. The approach demonstrates near-total backdoor success and minimal degradation in accuracy across multiple datasets and defense types, including synthetic shadow data. This work significantly expands the practical risk surface of FL, showing that backdoors can be injected without task-specific data, thereby stressing the need for more robust defense mechanisms that go beyond data-dependent threat models.

Abstract

Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development. To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed.
Paper Structure (18 sections, 5 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 5 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Visual comparison of the shadow datasets.
  • Figure 2: Illustration of property mimicry.
  • Figure 3: Attack performance on CIFAR-10 (first row), CIFAR-100 (second row), and GTSRB (third row).
  • Figure 4: Attack performance on CIFAR-10 with synthetic dataset.