Table of Contents
Fetching ...

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi

Abstract

Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Abstract

Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.

Paper Structure

This paper contains 33 sections, 8 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of SABLE, our semantics-aware backdoor attack in federated learning. Top: Standard FL round with a mixture of benign and malicious clients. Benign clients train on clean local data, while malicious clients return poisoned updates that are aggregated by the server into the next global model. Bottom: Training pipeline on a malicious client. A subset of local images is used to generate semantic triggers, producing paired clean/triggered samples. The malicious client optimizes a joint objective that combines clean cross-entropy, triggered cross-entropy toward the target label, feature-separation loss on paired samples, and a regularization term that keeps its update close to the global model, yielding a poisoned local model that still appears benign under aggregation.
  • Figure 2: Example CelebA samples used in SABLE. Top row: clean images with their original hair-color labels. Bottom row: corresponding triggered images where a natural semantic modification (sunglasses) is applied and the labels are remapped to the target class.
  • Figure 3: Clean and triggered GTSRB stop–sign samples used in our experiments. Top row: clean STOP signs with their ground-truth label. Bottom row: triggered versions where a small blue cap symbol is placed above the sign, forming a semantics-aligned backdoor pattern that is relabeled to the attacker-chosen target class.
  • Figure 4: Representation-level shift induced by SABLE versus the baseline. UMAP visualization of penultimate-layer embeddings from a ResNet-18 hair-color classifier on CelebA under MultiKrum training.
  • Figure 5: ASR of SABLE under varying malicious client ratios in a 10-client federated learning setup. We report mean ASR (%) for FedAvg, Trimmed Mean, and MultiKrum on CelebA with ResNet-18; error bars denote standard deviation across runs.
  • ...and 1 more figures