Table of Contents
Fetching ...

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

Kichang Lee, Yujin Shin, Jonghyuk Yun, Songkuk Kim, Jun Han, JeongGil Ko

TL;DR

DeTrigger introduces a gradient-centric defense for federated learning to detect and mitigate backdoor attacks at scale. By exploiting adversarial perturbation principles and applying temperature scaling to gradient signals, it isolates backdoor triggers and prunes the corresponding activations, preserving benign knowledge. Across four public datasets and multiple models, DeTrigger achieves substantial mitigation (up to 98.9%) with a dramatic speedup in detection (up to 251×) compared with traditional defenses, while maintaining global model accuracy. The framework combines a gradient preprocessing pipeline, total-variation and transferability-based detection, and a targeted pruning mechanism, demonstrating practical, scalable protection for federated learning in mobile and embedded environments.

Abstract

Federated Learning (FL) enables collaborative model training across distributed devices while preserving local data privacy, making it ideal for mobile and embedded systems. However, the decentralized nature of FL also opens vulnerabilities to model poisoning attacks, particularly backdoor attacks, where adversaries implant trigger patterns to manipulate model predictions. In this paper, we propose DeTrigger, a scalable and efficient backdoor-robust federated learning framework that leverages insights from adversarial attack methodologies. By employing gradient analysis with temperature scaling, DeTrigger detects and isolates backdoor triggers, allowing for precise model weight pruning of backdoor activations without sacrificing benign model knowledge. Extensive evaluations across four widely used datasets demonstrate that DeTrigger achieves up to 251x faster detection than traditional methods and mitigates backdoor attacks by up to 98.9%, with minimal impact on global model accuracy. Our findings establish DeTrigger as a robust and scalable solution to protect federated learning environments against sophisticated backdoor threats.

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

TL;DR

DeTrigger introduces a gradient-centric defense for federated learning to detect and mitigate backdoor attacks at scale. By exploiting adversarial perturbation principles and applying temperature scaling to gradient signals, it isolates backdoor triggers and prunes the corresponding activations, preserving benign knowledge. Across four public datasets and multiple models, DeTrigger achieves substantial mitigation (up to 98.9%) with a dramatic speedup in detection (up to 251×) compared with traditional defenses, while maintaining global model accuracy. The framework combines a gradient preprocessing pipeline, total-variation and transferability-based detection, and a targeted pruning mechanism, demonstrating practical, scalable protection for federated learning in mobile and embedded environments.

Abstract

Federated Learning (FL) enables collaborative model training across distributed devices while preserving local data privacy, making it ideal for mobile and embedded systems. However, the decentralized nature of FL also opens vulnerabilities to model poisoning attacks, particularly backdoor attacks, where adversaries implant trigger patterns to manipulate model predictions. In this paper, we propose DeTrigger, a scalable and efficient backdoor-robust federated learning framework that leverages insights from adversarial attack methodologies. By employing gradient analysis with temperature scaling, DeTrigger detects and isolates backdoor triggers, allowing for precise model weight pruning of backdoor activations without sacrificing benign model knowledge. Extensive evaluations across four widely used datasets demonstrate that DeTrigger achieves up to 251x faster detection than traditional methods and mitigates backdoor attacks by up to 98.9%, with minimal impact on global model accuracy. Our findings establish DeTrigger as a robust and scalable solution to protect federated learning environments against sophisticated backdoor threats.

Paper Structure

This paper contains 23 sections, 16 figures.

Figures (16)

  • Figure 1: (a) Illustration of backdoor attack in federated learning scenario for local training, server-side global aggregation, and local inference operations. (b) Comparison of our work with previously proposed approaches in addressing backdoor attacks.
  • Figure 2: (a) Illustration of adversarial and backdoor attacks represented with class decision boundaries. (b) Normal and backdoor-affected gradients for an input sample are presented in the normal data plane and backdoor plane. (c) Samples used in the preliminary study show valid and backdoor samples with detected triggers.
  • Figure 3: Overall workflow of DeTrigger. DeTrigger leverages insights from adversarial attack methodologies to effectively identify the trigger and prune the backdoor knowledge. The server first distributes the global model to selected clients for the local model training. Malicious clients may introduce backdoor triggers into their datasets. To mitigate this, DeTrigger analyzes gradients to detect potential backdoor triggers and prunes suspicious models before updating the global model.
  • Figure 4: Illustration of gradient preprocessing and trigger extraction operations in DeTrigger.
  • Figure 5: (a) Conceptual illustration of the impact of temperature scaling on normal data feature space and backdoor feature space. (b) Sample of ground truth and inferred triggers with different temperatures. (c) L1-norm between ground truth and inferred triggers with varying temperatures.
  • ...and 11 more figures