Table of Contents
Fetching ...

TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

Yanbo Dai, Songze Li, Zihan Gan, Xueluan Gong

TL;DR

This work tackles backdoor threats in federated learning by moving from post-hoc detection to proactive defense. TrojanDam robustifies redundant neurons in the global model at the server using OOD flood and shadow data, applying kernel-level gradient projections and BN-statistics handling to cancel backdoor effects during aggregation. The approach avoids identifying malicious client updates, instead fortifying the model before aggregation and using norm clipping to mitigate adversarial updates. Extensive experiments across CIFAR-10/100 and EMNIST show TrojanDam achieving state-of-the-art backdoor suppression across diverse attack strategies and non-IID settings, with minimal impact on main-task performance, highlighting its practical potential for secure FL deployments.

Abstract

Federated learning (FL) systems allow decentralized data-owning clients to jointly train a global model through uploading their locally trained updates to a centralized server. The property of decentralization enables adversaries to craft carefully designed backdoor updates to make the global model misclassify only when encountering adversary-chosen triggers. Existing defense mechanisms mainly rely on post-training detection after receiving updates. These methods either fail to identify updates which are deliberately fabricated statistically close to benign ones, or show inconsistent performance in different FL training stages. The effect of unfiltered backdoor updates will accumulate in the global model, and eventually become functional. Given the difficulty of ruling out every backdoor update, we propose a backdoor defense paradigm, which focuses on proactive robustification on the global model against potential backdoor attacks. We first reveal that the successful launching of backdoor attacks in FL stems from the lack of conflict between malicious and benign updates on redundant neurons of ML models. We proceed to prove the feasibility of activating redundant neurons utilizing out-of-distribution (OOD) samples in centralized settings, and migrating to FL settings to propose a novel backdoor defense mechanism, TrojanDam. The proposed mechanism has the FL server continuously inject fresh OOD mappings into the global model to activate redundant neurons, canceling the effect of backdoor updates during aggregation. We conduct systematic and extensive experiments to illustrate the superior performance of TrojanDam, over several SOTA backdoor defense methods across a wide range of FL settings.

TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

TL;DR

This work tackles backdoor threats in federated learning by moving from post-hoc detection to proactive defense. TrojanDam robustifies redundant neurons in the global model at the server using OOD flood and shadow data, applying kernel-level gradient projections and BN-statistics handling to cancel backdoor effects during aggregation. The approach avoids identifying malicious client updates, instead fortifying the model before aggregation and using norm clipping to mitigate adversarial updates. Extensive experiments across CIFAR-10/100 and EMNIST show TrojanDam achieving state-of-the-art backdoor suppression across diverse attack strategies and non-IID settings, with minimal impact on main-task performance, highlighting its practical potential for secure FL deployments.

Abstract

Federated learning (FL) systems allow decentralized data-owning clients to jointly train a global model through uploading their locally trained updates to a centralized server. The property of decentralization enables adversaries to craft carefully designed backdoor updates to make the global model misclassify only when encountering adversary-chosen triggers. Existing defense mechanisms mainly rely on post-training detection after receiving updates. These methods either fail to identify updates which are deliberately fabricated statistically close to benign ones, or show inconsistent performance in different FL training stages. The effect of unfiltered backdoor updates will accumulate in the global model, and eventually become functional. Given the difficulty of ruling out every backdoor update, we propose a backdoor defense paradigm, which focuses on proactive robustification on the global model against potential backdoor attacks. We first reveal that the successful launching of backdoor attacks in FL stems from the lack of conflict between malicious and benign updates on redundant neurons of ML models. We proceed to prove the feasibility of activating redundant neurons utilizing out-of-distribution (OOD) samples in centralized settings, and migrating to FL settings to propose a novel backdoor defense mechanism, TrojanDam. The proposed mechanism has the FL server continuously inject fresh OOD mappings into the global model to activate redundant neurons, canceling the effect of backdoor updates during aggregation. We conduct systematic and extensive experiments to illustrate the superior performance of TrojanDam, over several SOTA backdoor defense methods across a wide range of FL settings.

Paper Structure

This paper contains 19 sections, 3 equations, 8 figures, 14 tables, 2 algorithms.

Figures (8)

  • Figure 1: The backdoor task accuracy and the percentage of detected backdoors of (UPPER) BackdoorIndicator, and (LOWER) FreqFed.
  • Figure 2: The overview of FL systems with TrojanDam.
  • Figure 3: Accuracies of FL global model on the main task and the poisoned tasks which are trained using SGD, and Neurotoxin with different percentages of excluded parameters ($k$). The adversary conducts single client attack in a continuous fashion staring from 800th global round.
  • Figure 4: The magnitude distribution of model gradients which are trained for a fixed number of iterations using (GREEN) a mixture of main task data and OOD data, (BLUE) only main task data, and (ORANGE) only OOD data.
  • Figure 5: The number of neurons with gradients larger than 0.05 (UPPER) trained using different OOD dataset size, and (LOWER) trained using 300 flood samples, and different parameter selection methods.
  • ...and 3 more figures