Table of Contents
Fetching ...

Invariant Aggregator for Defending against Federated Backdoor Attacks

Xiaoyang Wang, Dimitrios Dimitriadis, Sanmi Koyejo, Shruti Tople

TL;DR

The paper studies backdoor vulnerabilities in federated learning under flat loss landscapes and introduces an invariant aggregator that steers updates along invariant directions. By combining an AND-mask that enforces per-dimension sign-consistency with a trimmed-mean per-dimension, the method suppresses updates that benefit only malicious clients and outliers. The authors provide theoretical guarantees linking flatness, attack success, and convergence toward benign minima, along with empirical results showing substantial reductions in backdoor success rates (approximately $61.6\%$) with minimal loss in benign accuracy (about $1.2\%$) across multiple datasets and attack strategies. This approach offers a robust, scalable defense for federated settings with minority adversaries and diverse data modalities, with practical impact on secure collaborative learning systems.

Abstract

Federated learning enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attacks that increase model accuracy on backdoor samples exclusively without hurting the utility on other samples remains challenging. To this end, we first analyze the failure modes of existing defenses over a flat loss landscape, which is common for well-designed neural networks such as Resnet (He et al., 2015) but is often overlooked by previous works. Then, we propose an invariant aggregator that redirects the aggregated update to invariant directions that are generally useful via selectively masking out the update elements that favor few and possibly malicious clients. Theoretical results suggest that our approach provably mitigates backdoor attacks and remains effective over flat loss landscapes. Empirical results on three datasets with different modalities and varying numbers of clients further demonstrate that our approach mitigates a broad class of backdoor attacks with a negligible cost on the model utility.

Invariant Aggregator for Defending against Federated Backdoor Attacks

TL;DR

The paper studies backdoor vulnerabilities in federated learning under flat loss landscapes and introduces an invariant aggregator that steers updates along invariant directions. By combining an AND-mask that enforces per-dimension sign-consistency with a trimmed-mean per-dimension, the method suppresses updates that benefit only malicious clients and outliers. The authors provide theoretical guarantees linking flatness, attack success, and convergence toward benign minima, along with empirical results showing substantial reductions in backdoor success rates (approximately ) with minimal loss in benign accuracy (about ) across multiple datasets and attack strategies. This approach offers a robust, scalable defense for federated settings with minority adversaries and diverse data modalities, with practical impact on secure collaborative learning systems.

Abstract

Federated learning enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attacks that increase model accuracy on backdoor samples exclusively without hurting the utility on other samples remains challenging. To this end, we first analyze the failure modes of existing defenses over a flat loss landscape, which is common for well-designed neural networks such as Resnet (He et al., 2015) but is often overlooked by previous works. Then, we propose an invariant aggregator that redirects the aggregated update to invariant directions that are generally useful via selectively masking out the update elements that favor few and possibly malicious clients. Theoretical results suggest that our approach provably mitigates backdoor attacks and remains effective over flat loss landscapes. Empirical results on three datasets with different modalities and varying numbers of clients further demonstrate that our approach mitigates a broad class of backdoor attacks with a negligible cost on the model utility.
Paper Structure (48 sections, 8 theorems, 2 equations, 6 figures, 10 tables, 1 algorithm)

This paper contains 48 sections, 8 theorems, 2 equations, 6 figures, 10 tables, 1 algorithm.

Key Result

Proposition 5

Let $\bm{g}$ be a 2-dimensional (2-d) benign gradient, $\bm{g}'$ be a 2-d malicious gradient, and $\bm{g}^*$ be a 2-d reference gradient estimated over the trust root dataset, suppose $\bm{g}_0\bm{g}'_0 < 0$ and $\bm{g}_1\bm{g}'_1 > 0$, under the aggregation rule of FLTrust which enforces $\|\bm{g}\

Figures (6)

  • Figure 1: (a) Overview of our motivating setting where benign minima (with convex hull $\mathcal{W}^*$) and the malicious minimum $\bm{w}'$ are separable. $\bm{w}$ is the parameter in the previous round, and the dashed circles in (b) and (c) are loss contours. (b) The flat landscape of a benign client (blue) along the horizontal axis reduces the horizontal gradient magnitude, allowing a malicious client (red) to easily mislead the aggregated gradient $\bar{\bm{g}}$ toward the malicious minimum $\bm{w}'$. (c) The malicious client can mimic the benign client (red dashed arrow) along the vertical dimension with less penalty due to its flat loss landscape along the vertical axis.
  • Figure 2: Failure mode examples of existing approaches. (a) FLTrust can fail to recover the benign direction (blue) along the horizontal axis, which may subsequently converge model parameters to a malicious minimum (Figure \ref{['figure: flatness study']}). This is because a malicious client (red) can mimic the benign client along the vertical axis to avoid being detected as an anomaly, and misleading the aggregation result along the horizontal axis is easier due to the small benign gradient magnitude caused by flat loss landscape. (b) Median can fail to recover the benign direction (blue) even if the estimation error is small when a few benign gradients flip (blue arrow) their sign due to gradient estimation noise. Gradients with smaller magnitudes may be easier to flip for a given noise level.
  • Figure 3: Numerical simulation for Equation \ref{['equation: fltrust']}.
  • Figure 4: Loss landscapes along the direction of malicious and random gradients. Distance is defined using the gradient norm. Distance 0.0 means the parameter stays at a minimum.
  • Figure 5: Gradient element value histogram of a malicious gradient.
  • ...and 1 more figures

Theorems & Definitions (16)

  • Definition 4
  • Proposition 5
  • Proposition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • Theorem 10
  • Theorem 11
  • Proposition 5
  • proof
  • ...and 6 more