Table of Contents
Fetching ...

Adversarially Guided Stateful Defense Against Backdoor Attacks in Federated Deep Learning

Hassan Ali, Surya Nepal, Salil S. Kanhere, Sanjay Jha

TL;DR

In realistic FL settings, where SOTA defenses mostly fail to resist attacks, AGSD mostly outperforms all SOTA defenses with minimal drop in clean accuracy even when (a) given a very small held-out dataset, and (b) no held-out dataset is available, and out-of-distribution data is used instead.

Abstract

Recent works have shown that Federated Learning (FL) is vulnerable to backdoor attacks. Existing defenses cluster submitted updates from clients and select the best cluster for aggregation. However, they often rely on unrealistic assumptions regarding client submissions and sampled clients population while choosing the best cluster. We show that in realistic FL settings, state-of-the-art (SOTA) defenses struggle to perform well against backdoor attacks in FL. To address this, we highlight that backdoored submissions are adversarially biased and overconfident compared to clean submissions. We, therefore, propose an Adversarially Guided Stateful Defense (AGSD) against backdoor attacks on Deep Neural Networks (DNNs) in FL scenarios. AGSD employs adversarial perturbations to a small held-out dataset to compute a novel metric, called the trust index, that guides the cluster selection without relying on any unrealistic assumptions regarding client submissions. Moreover, AGSD maintains a trust state history of each client that adaptively penalizes backdoored clients and rewards clean clients. In realistic FL settings, where SOTA defenses mostly fail to resist attacks, AGSD mostly outperforms all SOTA defenses with minimal drop in clean accuracy (5% in the worst-case compared to best accuracy) even when (a) given a very small held-out dataset -- typically AGSD assumes 50 samples (<= 0.1% of the training data) and (b) no heldout dataset is available, and out-of-distribution data is used instead. For reproducibility, our code will be openly available at: https://github.com/hassanalikhatim/AGSD.

Adversarially Guided Stateful Defense Against Backdoor Attacks in Federated Deep Learning

TL;DR

In realistic FL settings, where SOTA defenses mostly fail to resist attacks, AGSD mostly outperforms all SOTA defenses with minimal drop in clean accuracy even when (a) given a very small held-out dataset, and (b) no held-out dataset is available, and out-of-distribution data is used instead.

Abstract

Recent works have shown that Federated Learning (FL) is vulnerable to backdoor attacks. Existing defenses cluster submitted updates from clients and select the best cluster for aggregation. However, they often rely on unrealistic assumptions regarding client submissions and sampled clients population while choosing the best cluster. We show that in realistic FL settings, state-of-the-art (SOTA) defenses struggle to perform well against backdoor attacks in FL. To address this, we highlight that backdoored submissions are adversarially biased and overconfident compared to clean submissions. We, therefore, propose an Adversarially Guided Stateful Defense (AGSD) against backdoor attacks on Deep Neural Networks (DNNs) in FL scenarios. AGSD employs adversarial perturbations to a small held-out dataset to compute a novel metric, called the trust index, that guides the cluster selection without relying on any unrealistic assumptions regarding client submissions. Moreover, AGSD maintains a trust state history of each client that adaptively penalizes backdoored clients and rewards clean clients. In realistic FL settings, where SOTA defenses mostly fail to resist attacks, AGSD mostly outperforms all SOTA defenses with minimal drop in clean accuracy (5% in the worst-case compared to best accuracy) even when (a) given a very small held-out dataset -- typically AGSD assumes 50 samples (<= 0.1% of the training data) and (b) no heldout dataset is available, and out-of-distribution data is used instead. For reproducibility, our code will be openly available at: https://github.com/hassanalikhatim/AGSD.

Paper Structure

This paper contains 15 sections, 16 equations, 18 figures, 5 tables, 1 algorithm.

Figures (18)

  • Figure 1: Among randomly sampled clients, malicious clients $n_-$ may outnumber the sampled clean clients $n_+$ in many rounds (e.g., $\frac{n_-}{n_-+n_+} \geq 0.5$ in the figure), invalidating SOTA defenses' assumption nguyen2022flamefung2018mitigatingkrauss2023mesaszhang2023flip, thereby backdooring the defenses. (Settings: All settings are similar to MESAS and Flame, except that the clients are sampled randomly).
  • Figure 2: Standard deviation of the output classes of clean and backdoored classifiers for adversarial inputs.
  • Figure 3: Confidence of clean and backdoored classifiers when classifying adversarial inputs.
  • Figure 4: Illustration of the working of AGSD in four steps for training round $t$, where the number of sampled clients $c$ is assumed to be 4. AGSD maintains a trust history $\phi_i$ of each client. After clustering client submissions in Step 2, AGSD uses a novel method to compute the trust-index $\gamma_i$ of each submission in Step 3 to identify the best cluster for the update. In step 4, clients of the best cluster with $\phi_i < 0$ are ruled out of aggregation.
  • Figure 5: AGSD uses an improved clustering metric based on the difference from the preliminary aggregated model. This lets AGSD distinguish adaptive clients, unlike SOTA defense.
  • ...and 13 more figures