TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

Yanbo Dai; Songze Li; Zihan Gan; Xueluan Gong

TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

Yanbo Dai, Songze Li, Zihan Gan, Xueluan Gong

TL;DR

This work tackles backdoor threats in federated learning by moving from post-hoc detection to proactive defense. TrojanDam robustifies redundant neurons in the global model at the server using OOD flood and shadow data, applying kernel-level gradient projections and BN-statistics handling to cancel backdoor effects during aggregation. The approach avoids identifying malicious client updates, instead fortifying the model before aggregation and using norm clipping to mitigate adversarial updates. Extensive experiments across CIFAR-10/100 and EMNIST show TrojanDam achieving state-of-the-art backdoor suppression across diverse attack strategies and non-IID settings, with minimal impact on main-task performance, highlighting its practical potential for secure FL deployments.

Abstract

Federated learning (FL) systems allow decentralized data-owning clients to jointly train a global model through uploading their locally trained updates to a centralized server. The property of decentralization enables adversaries to craft carefully designed backdoor updates to make the global model misclassify only when encountering adversary-chosen triggers. Existing defense mechanisms mainly rely on post-training detection after receiving updates. These methods either fail to identify updates which are deliberately fabricated statistically close to benign ones, or show inconsistent performance in different FL training stages. The effect of unfiltered backdoor updates will accumulate in the global model, and eventually become functional. Given the difficulty of ruling out every backdoor update, we propose a backdoor defense paradigm, which focuses on proactive robustification on the global model against potential backdoor attacks. We first reveal that the successful launching of backdoor attacks in FL stems from the lack of conflict between malicious and benign updates on redundant neurons of ML models. We proceed to prove the feasibility of activating redundant neurons utilizing out-of-distribution (OOD) samples in centralized settings, and migrating to FL settings to propose a novel backdoor defense mechanism, TrojanDam. The proposed mechanism has the FL server continuously inject fresh OOD mappings into the global model to activate redundant neurons, canceling the effect of backdoor updates during aggregation. We conduct systematic and extensive experiments to illustrate the superior performance of TrojanDam, over several SOTA backdoor defense methods across a wide range of FL settings.

TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

TL;DR

Abstract

TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)