Table of Contents
Fetching ...

On the Detectability of Active Gradient Inversion Attacks in Federated Learning

Vincenzo Carletti, Pasquale Foggia, Carlo Mazzocca, Giuseppe Parrella, Mario Vento

TL;DR

The paper tackles privacy risks in Federated Learning by analyzing active gradient inversion attacks (GIAs) and their detectability. It provides a comprehensive assessment of four state-of-the-art active GIAs (two handcrafted and two learned) and introduces lightweight, client-side detectors based on weight-space anomalies and loss/gradient dynamics. Through extensive experiments on multiple datasets and architectures, the detectors demonstrate strong performance and robustness, even under partial participation, without requiring changes to the FL protocol. The work highlights practical defense implications for FL deployments and emphasizes the need to consider behavioral side-channels alongside gradient-space signals in adversarial settings.

Abstract

One of the key advantages of Federated Learning (FL) is its ability to collaboratively train a Machine Learning (ML) model while keeping clients' data on-site. However, this can create a false sense of security. Despite not sharing private data increases the overall privacy, prior studies have shown that gradients exchanged during the FL training remain vulnerable to Gradient Inversion Attacks (GIAs). These attacks allow reconstructing the clients' local data, breaking the privacy promise of FL. GIAs can be launched by either a passive or an active server. In the latter case, a malicious server manipulates the global model to facilitate data reconstruction. While effective, earlier attacks falling under this category have been demonstrated to be detectable by clients, limiting their real-world applicability. Recently, novel active GIAs have emerged, claiming to be far stealthier than previous approaches. This work provides the first comprehensive analysis of these claims, investigating four state-of-the-art GIAs. We propose novel lightweight client-side detection techniques, based on statistically improbable weight structures and anomalous loss and gradient dynamics. Extensive evaluation across several configurations demonstrates that our methods enable clients to effectively detect active GIAs without any modifications to the FL training protocol.

On the Detectability of Active Gradient Inversion Attacks in Federated Learning

TL;DR

The paper tackles privacy risks in Federated Learning by analyzing active gradient inversion attacks (GIAs) and their detectability. It provides a comprehensive assessment of four state-of-the-art active GIAs (two handcrafted and two learned) and introduces lightweight, client-side detectors based on weight-space anomalies and loss/gradient dynamics. Through extensive experiments on multiple datasets and architectures, the detectors demonstrate strong performance and robustness, even under partial participation, without requiring changes to the FL protocol. The work highlights practical defense implications for FL deployments and emphasizes the need to consider behavioral side-channels alongside gradient-space signals in adversarial settings.

Abstract

One of the key advantages of Federated Learning (FL) is its ability to collaboratively train a Machine Learning (ML) model while keeping clients' data on-site. However, this can create a false sense of security. Despite not sharing private data increases the overall privacy, prior studies have shown that gradients exchanged during the FL training remain vulnerable to Gradient Inversion Attacks (GIAs). These attacks allow reconstructing the clients' local data, breaking the privacy promise of FL. GIAs can be launched by either a passive or an active server. In the latter case, a malicious server manipulates the global model to facilitate data reconstruction. While effective, earlier attacks falling under this category have been demonstrated to be detectable by clients, limiting their real-world applicability. Recently, novel active GIAs have emerged, claiming to be far stealthier than previous approaches. This work provides the first comprehensive analysis of these claims, investigating four state-of-the-art GIAs. We propose novel lightweight client-side detection techniques, based on statistically improbable weight structures and anomalous loss and gradient dynamics. Extensive evaluation across several configurations demonstrates that our methods enable clients to effectively detect active GIAs without any modifications to the FL training protocol.

Paper Structure

This paper contains 20 sections, 21 equations, 8 figures, 6 tables, 3 algorithms.

Figures (8)

  • Figure 1: Graphical illustration of an active GIA workflow. (i) The server manipulates the global model and distributes it to clients. (ii) Each client performs local training using the manipulated global model to compute model updates. (iii) The server receives these updates and exploits them to reconstruct the client's training data.
  • Figure 2: Graphical illustration of linear layer leakage. Arrows indicate that each input exclusively activates a specific neuron in the linear layer. When a single input is present in the batch, its value can be immediately reconstructed from the corresponding gradients. However, when multiple inputs activate different neurons simultaneously, the resulting gradients represent a weighted sum of those inputs, making recovery more complex trap_w.
  • Figure 3: Neuron diversity collapse visualized via 3D PCA. Benign neurons (blue) create a scattered cloud, indicating varied features. All neurons of the model manipulated with Shi et al. Shi2023ScaleMIAAS collapse to a single (red) point, showing the attack forces identical neuron representations.
  • Figure 4: Victim client local accuracy (64 samples) showing a severe degradation at round 150, when the server distributes a manipulated model $\theta^{\mathcal{A}}_{150}$ based on Garov et al. garov2024hiding and Shan et al. shan2025geminio.
  • Figure 5: 3D comparison of loss surfaces. The loss of a legitimate model is the blue pre-manipulation surface (e.g., the legitimate model $\theta_{t-1}$), contrasted with the outcomes of a manipulated model with Shan et al. shan2025geminio (e.g., when $\theta_t$ is obtained with Eqn. \ref{['eq:geminio']}).
  • ...and 3 more figures