Table of Contents
Fetching ...

SPEAR:Exact Gradient Inversion of Batches in Federated Learning

Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev

TL;DR

SPEAR challenges the prevailing view that exact batch reconstruction in honest-but-curious federated learning is limited to $b=1$, showing that entire batches with $b>1$ can be recovered exactly by exploiting a low-rank gradient structure and ReLU-induced sparsity. The method combines a explicit low-rank representation with a sampling-based search for candidate disaggregation directions, a sparsity-driven filtering step, and a greedy optimization to assemble the correct batch while recovering the input via a scaling from the bias gradient. The authors provide a thorough theoretical analysis of when exact recovery is possible and quantify the sampling complexity and failure probability, complemented by an efficient GPU implementation that achieves high-precision reconstructions on ImageNet-scale data for batches up to roughly $b\approx 25$. Empirically SPEAR outperforms prior gradient inversion methods in accuracy and speed across MNIST to ImageNet tasks and remains robust to DPSGD noise, though it scales exponentially with $b$, motivating defense via larger effective batch sizes and further study of privacy-preserving mechanisms.

Abstract

Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for batch size of $b=1$, with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to $b \lesssim 25$ exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time.

SPEAR:Exact Gradient Inversion of Batches in Federated Learning

TL;DR

SPEAR challenges the prevailing view that exact batch reconstruction in honest-but-curious federated learning is limited to , showing that entire batches with can be recovered exactly by exploiting a low-rank gradient structure and ReLU-induced sparsity. The method combines a explicit low-rank representation with a sampling-based search for candidate disaggregation directions, a sparsity-driven filtering step, and a greedy optimization to assemble the correct batch while recovering the input via a scaling from the bias gradient. The authors provide a thorough theoretical analysis of when exact recovery is possible and quantify the sampling complexity and failure probability, complemented by an efficient GPU implementation that achieves high-precision reconstructions on ImageNet-scale data for batches up to roughly . Empirically SPEAR outperforms prior gradient inversion methods in accuracy and speed across MNIST to ImageNet tasks and remains robust to DPSGD noise, though it scales exponentially with , motivating defense via larger effective batch sizes and further study of privacy-preserving mechanisms.

Abstract

Federated learning is a framework for collaborative machine learning where clients only share gradient updates and not their private data with a server. However, it was recently shown that gradient inversion attacks can reconstruct this data from the shared gradients. In the important honest-but-curious setting, existing attacks enable exact reconstruction only for batch size of , with larger batches permitting only approximate reconstruction. In this work, we propose SPEAR, the first algorithm reconstructing whole batches with exactly. SPEAR combines insights into the explicit low-rank structure of gradients with a sampling-based algorithm. Crucially, we leverage ReLU-induced gradient sparsity to precisely filter out large numbers of incorrect samples, making a final reconstruction step tractable. We provide an efficient GPU implementation for fully connected networks and show that it recovers high-dimensional ImageNet inputs in batches of up to exactly while scaling to large networks. Finally, we show theoretically that much larger batches can be reconstructed with high probability given exponential time.
Paper Structure (64 sections, 16 theorems, 17 equations, 24 figures, 11 tables, 3 algorithms)

This paper contains 64 sections, 16 theorems, 17 equations, 24 figures, 11 tables, 3 algorithms.

Key Result

Theorem 3.1

The network's gradient w.r.t. the weights ${\bm{W}}$ can be represented as the matrix product:

Figures (24)

  • Figure 1: A sample of four images from a batch of $b=20$, reconstructed using our SPEAR (top) or the prior state-of-the-art geiping (mid), compared to the ground truth (bottom).
  • Figure 2: Overview of SPEAR. The gradient $\frac{\partial \mathcal{L}}{{\bm{W}}}$ is decomposed to ${\bm{R}}$ and ${\bm{L}}$. Sampling gives $N$ proposal directions, which we filter down to $c$ candidates via a sparsity criterion with threshold $\tau*m$. A greedy selection method selects batchsize $b$ directions. Scale recovery via $\frac{\partial \mathcal{L}}{\partial {\bm{b}}}$ returns the disaggregation matrix ${\bm{Q}}$ and thus the inputs ${\bm{X}}$.
  • Figure 3: Visualizations of the upper bound ($p_{\text{fail}}^\text{ub}$, dashed) on and approximation of ($p_{\text{fail}}^\text{approx}$, solid) the failure probability of SPEAR for different batch sizes $b$ and network widths $m$ for $p_{fr} = 10^{-9}$.
  • Figure 4: Effect of batch size $b$ on the number of required submatrices. Expectation from \ref{['lemma:expected_samples']} dashed and median (10$^\text{th}$ to 90$^\text{th}$ percentile shaded) depending on network width $m$ solid. We always evaluate $10^4$ submatrices in parallel, explaining the plateau.
  • Figure 5: Accuracy (green) and number of median iterations (blue) for different network widths $m$ at $L=6$ (left) and depths $L$ at $m=200$ (right).
  • ...and 19 more figures

Theorems & Definitions (26)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • proof
  • Theorem 3.4
  • Theorem 3.5
  • Definition 4.1
  • Lemma 5.0
  • Lemma 5.0
  • Lemma 5.0
  • ...and 16 more