Table of Contents
Fetching ...

Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning

Alessandro Licciardi, Davide Leo, Davide Carbone

TL;DR

This work tackles malicious or faulty client detection in Federated Learning by introducing WAFFLE, an offline detector that operates before training using low-dimensional spectral embeddings derived from either the Fourier Transform or Wavelet Scattering Transform. WAFFLE trains its detector on a distillated public auxiliary dataset and relies on PCA-derived client summaries along with spectral embeddings to label clients as benign or malicious with minimal communication overhead. The authors provide theoretical guarantees showing that removing malicious clients yields an unbiased and less noisy global estimator, and demonstrate through extensive experiments that WAFFLE, particularly the WST variant, improves detection accuracy and downstream task performance under both Gaussian and non-Gaussian attacks, including an NLP proof-of-concept. The method preserves privacy by keeping raw data on-device and sharing only non-invertible embeddings, and it can be combined with existing robust aggregation methods for a multi-layered defense in scalable FL deployments.

Abstract

Federated Learning (FL) enables the training of machine learning models across decentralized clients while preserving data privacy. However, the presence of anomalous or corrupted clients - such as those with faulty sensors or non representative data distributions - can significantly degrade model performance. Detecting such clients without accessing raw data remains a key challenge. We propose WAFFLE (Wavelet and Fourier representations for Federated Learning) a detection algorithm that labels malicious clients {\it before training}, using locally computed compressed representations derived from either the Wavelet Scattering Transform (WST) or the Fourier Transform. Both approaches provide low-dimensional, task-agnostic embeddings suitable for unsupervised client separation. A lightweight detector, trained on a distillated public dataset, performs the labeling with minimal communication and computational overhead. While both transforms enable effective detection, WST offers theoretical advantages, such as non-invertibility and stability to local deformations, that make it particularly well-suited to federated scenarios. Experiments on benchmark datasets show that our method improves detection accuracy and downstream classification performance compared to existing FL anomaly detection algorithms, validating its effectiveness as a pre-training alternative to online detection strategies.

Wavelet Scattering Transform and Fourier Representation for Offline Detection of Malicious Clients in Federated Learning

TL;DR

This work tackles malicious or faulty client detection in Federated Learning by introducing WAFFLE, an offline detector that operates before training using low-dimensional spectral embeddings derived from either the Fourier Transform or Wavelet Scattering Transform. WAFFLE trains its detector on a distillated public auxiliary dataset and relies on PCA-derived client summaries along with spectral embeddings to label clients as benign or malicious with minimal communication overhead. The authors provide theoretical guarantees showing that removing malicious clients yields an unbiased and less noisy global estimator, and demonstrate through extensive experiments that WAFFLE, particularly the WST variant, improves detection accuracy and downstream task performance under both Gaussian and non-Gaussian attacks, including an NLP proof-of-concept. The method preserves privacy by keeping raw data on-device and sharing only non-invertible embeddings, and it can be combined with existing robust aggregation methods for a multi-layered defense in scalable FL deployments.

Abstract

Federated Learning (FL) enables the training of machine learning models across decentralized clients while preserving data privacy. However, the presence of anomalous or corrupted clients - such as those with faulty sensors or non representative data distributions - can significantly degrade model performance. Detecting such clients without accessing raw data remains a key challenge. We propose WAFFLE (Wavelet and Fourier representations for Federated Learning) a detection algorithm that labels malicious clients {\it before training}, using locally computed compressed representations derived from either the Wavelet Scattering Transform (WST) or the Fourier Transform. Both approaches provide low-dimensional, task-agnostic embeddings suitable for unsupervised client separation. A lightweight detector, trained on a distillated public dataset, performs the labeling with minimal communication and computational overhead. While both transforms enable effective detection, WST offers theoretical advantages, such as non-invertibility and stability to local deformations, that make it particularly well-suited to federated scenarios. Experiments on benchmark datasets show that our method improves detection accuracy and downstream classification performance compared to existing FL anomaly detection algorithms, validating its effectiveness as a pre-training alternative to online detection strategies.

Paper Structure

This paper contains 30 sections, 6 theorems, 21 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Lemma 1

If the benign and malicious client updates have different mean parameter values, i.e., $\bar{\theta}^m \neq \bar{\theta}^b$, then the standard federated averaging estimator $\theta_{avg}$ is a biased estimator of $\bar{\theta}^b$, meaning $\mathbb{E}[\theta_{avg}] \neq \bar{\theta}^b$.

Figures (3)

  • Figure 1: Examples of attacked data. Two images downloaded from https://chefjar.com/wp-content/uploads/2024/12/belgian-waffle-recipe-1-1-1000x1477.jpg and https://www.google.com/imgres?imgurl=https://lh3.googleusercontent.com/C8p-ppuCYDwLOcZC-5SXWrmCywhDWV21vh6ri4XIHA_ZiHtJ8vC2WM4FejFA5WGB3iCs&tbnid=iM2fyTR1KCsKWM&vet=1&imgrefurl=https://apptopia.com/google-play/app/ru.rukart.VafliVafeln/about&docid=8aPyeFUJfaAqcM&w=512&h=512&hl=en-IT&source=sh/x/im/m1/1&kgs=f2519ed18262765a. For each image: left: clean client, center: noisy attack with magnitude $\sigma = 0.2$, right: blur attack with spread $\beta = 11$.
  • Figure 2: Client distributions of the $\varphi_k$ for Cifar10 dataset with $K = 100$ clients on a 2-dimensional space, for Waffle + FT (left), and Waffle + WST (right). There is a total of 60 benign clients (dots), and 40 attackers: 20 noisy (crosses) and 20 blurred (triangles). Both methods provide a noticeable separation between the clients.
  • Figure 3: Embeddings $\varphi_k$ produced by Waffle for three clients (blur attacker, noise attacker, and benign) on CIFAR-10. The left panel shows the embeddings obtained using FT, while the right panel shows those obtained using WST.

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Lemma 2
  • Proposition 1
  • Lemma A1
  • proof
  • Lemma A2
  • ...and 3 more