Table of Contents
Fetching ...

Optimal Defenses Against Gradient Reconstruction Attacks

Yuxiao Chen, Gamze Gürsoy, Qi Lei

TL;DR

Experimental results validate that the methods outperform Gradient Noise and Gradient Pruning by protecting the training data better while also achieving better utility.

Abstract

Federated Learning (FL) is designed to prevent data leakage through collaborative model training without centralized data storage. However, it remains vulnerable to gradient reconstruction attacks that recover original training data from shared gradients. To optimize the trade-off between data leakage and utility loss, we first derive a theoretical lower bound of reconstruction error (among all attackers) for the two standard methods: adding noise, and gradient pruning. We then customize these two defenses to be parameter- and model-specific and achieve the optimal trade-off between our obtained reconstruction lower bound and model utility. Experimental results validate that our methods outperform Gradient Noise and Gradient Pruning by protecting the training data better while also achieving better utility.

Optimal Defenses Against Gradient Reconstruction Attacks

TL;DR

Experimental results validate that the methods outperform Gradient Noise and Gradient Pruning by protecting the training data better while also achieving better utility.

Abstract

Federated Learning (FL) is designed to prevent data leakage through collaborative model training without centralized data storage. However, it remains vulnerable to gradient reconstruction attacks that recover original training data from shared gradients. To optimize the trade-off between data leakage and utility loss, we first derive a theoretical lower bound of reconstruction error (among all attackers) for the two standard methods: adding noise, and gradient pruning. We then customize these two defenses to be parameter- and model-specific and achieve the optimal trade-off between our obtained reconstruction lower bound and model utility. Experimental results validate that our methods outperform Gradient Noise and Gradient Pruning by protecting the training data better while also achieving better utility.

Paper Structure

This paper contains 39 sections, 7 theorems, 70 equations, 14 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $B_{\mathcal{D},S}$ be as defined in Definition def:lowerbound. Under Assumptions 1 to 5, we lower bound $B_{\mathcal{D},S}$ by: where ${\bm{J}}_F({\bm{x}})$ is given by: and ${\bm{J}}_P$ by:

Figures (14)

  • Figure 1: DP-SGD treats all parameters with the same vulnerability, while our method distinguishes the vulnerability of each parameter and designs a customized defense strategy.
  • Figure 2: Average reconstruction indexes based on Gradient Inversion with batch size 16.
  • Figure 3: Training curves of CNN on MNIST with 80% & 90% gradient pruning and 80% optimal pruning (smoothed with window size 8). 80% optimal pruning outperforms 90% gradient pruning in training.
  • Figure 4: Scatter plot of gradient pruning and our optimal pruning on MNIST. X-axis: average reconstruction MSE. Y axis: Training loss on 64 samples. Size of points: Pruning ratio.
  • Figure 5: Average reconstruction indexes based on Gradient Inversion for DP-SGD. The noise scale equals the Frobenius norm of the covariance matrix.
  • ...and 9 more figures

Theorems & Definitions (17)

  • Definition 1
  • Theorem 1
  • Remark 1
  • Definition 2
  • Theorem 2: Optimal Gradient Noise
  • Theorem 3: Optimal DP-SGD
  • Theorem 4: Optimal Gradient Pruning
  • Remark 2
  • Lemma 1
  • proof
  • ...and 7 more