EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Yuwen Qian; Shuchi Wu; Kang Wei; Ming Ding; Di Xiao; Tao Xiang; Chuan Ma; Song Guo

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo

TL;DR

The paper shows that backdoors in Federated Self-Supervised Learning manifest as embedding-space manipulation rather than label-space tricks, making them hard to detect with traditional defenses. It introduces EmInspector, an embedding-space inspection approach that uses a small inspection set to detect malicious clients by comparing embedding similarities and removing suspected models before aggregation. Empirical results across CIFAR-10/100, STL-10, and GTSRB demonstrate that EmInspector dramatically reduces attack success rates while preserving or improving the primary task accuracy, outperforming several baselines and proving robust to adaptive attackers. The method is simple to implement, requires minimal inspection data, and scales across architectures and data distributions, offering a practical defense for real-world FSSL deployments.

Abstract

Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake a comprehensive investigation into a backdoor attack paradigm, where unscrupulous clients conspire to manipulate the global model, revealing the vulnerability of FSSL to such attacks. In FSL, backdoor attacks typically build a direct association between the backdoor trigger and the target label. In contrast, in FSSL, backdoor attacks aim to alter the global model's representation for images containing the attacker's specified trigger pattern in favor of the attacker's intended target class, which is less straightforward. In this sense, we demonstrate that existing defenses are insufficient to mitigate the investigated backdoor attacks in FSSL, thus finding an effective defense mechanism is urgent. To tackle this issue, we dive into the fundamental mechanism of backdoor attacks on FSSL, proposing the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models. In particular, EmInspector assesses the similarity of embeddings from different local models using a small set of inspection images (e.g., ten images of CIFAR100) without specific requirements on sample distribution or labels. We discover that embeddings from backdoored models tend to cluster together in the embedding space for a given inspection image. Evaluation results show that EmInspector can effectively mitigate backdoor attacks on FSSL across various adversary settings. Our code is avaliable at https://github.com/ShuchiWu/EmInspector.

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

TL;DR

Abstract

Paper Structure (35 sections, 15 equations, 12 figures, 10 tables, 3 algorithms)

This paper contains 35 sections, 15 equations, 12 figures, 10 tables, 3 algorithms.

Introduction
Preliminaries
Federated Self-Supervised Learning
Backdoor Attacks in FL & SSL
Defense Mechanisms for FL
System Formulation
System Model of FSSL
Attacker's Objectives and Capability
Backdoor Attacks on FSSL
Design of EmInspector
Key observations
Backdoor Mitigation for FSSL
Experimental Setup
Datasets and Models
FSSL Setting
...and 20 more sections

Figures (12)

Figure 1: An effective backdoor attack in SSL should make the encoder pull the embedding of the image with a trigger to the target class (below, take "car" as the target class for instance), while the clean encoder performs normally (above).
Figure 2: Overview of backdoor attacks on FSSL with two different collusion methods. In the single pattern attack, all malicious clients use the same trigger pattern for attacking, while in the coordinated pattern attack, the attacker distributes a unique trigger for each malicious client and uses an assembled pattern of them to trigger the backdoor embedded in the aggregated model.
Figure 3: Relative errors of the gap between the target class and other classes measured by a backdoored and a benign model. Epochs for injecting backdoors vary from 1 to 20.
Figure 4: t-SNE plots of the embedding space. Embeddings are generated by 10 backdoored models, 40 benign models, and the current global model. (a) and (b) are plotted under a non-i.i.d and an i.i.d data setting, respectively. The input image is shown in the bottom right/left corner in (a)/(b).
Figure 5: Attacking from the very beginning v.s. attacking when the global model is close to convergence. CA represents the ACC of the clean model under a no-attack setting. BA represents the ACC of the backdoored model. ASR represents the attack success rate. The prefix "E" means the former "attack" early scenario and the prefix "L" means the latter "attack late" scenario.
...and 7 more figures

Theorems & Definitions (1)

proof

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

TL;DR

Abstract

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (1)