Table of Contents
Fetching ...

Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu

TL;DR

The paper addresses the challenge of achieving tight $L_2$-geometric mean estimation under central differential privacy with communication constraints in federated learning. It introduces a novel privacy accounting method for the sparsified Gaussian mechanism that integrates sparsification randomness directly in $L_2$ DP analysis, driving $MSE$ toward that of the uncompressed Gaussian mechanism. It then extends sparsification to streaming DP within a matrix-factorization DP-FTRL framework, providing a Rényi DP accountant that handles temporal and spatial coupling, and proves a bound for sparsified Gaussian matrix factorization. Empirically, the approach yields at least $100\times$ compression improvements in DP-SGD across multiple FL tasks, while maintaining competitive accuracy, demonstrating substantial practical impact for communication-efficient private learning.

Abstract

We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers. In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.

Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

TL;DR

The paper addresses the challenge of achieving tight -geometric mean estimation under central differential privacy with communication constraints in federated learning. It introduces a novel privacy accounting method for the sparsified Gaussian mechanism that integrates sparsification randomness directly in DP analysis, driving toward that of the uncompressed Gaussian mechanism. It then extends sparsification to streaming DP within a matrix-factorization DP-FTRL framework, providing a Rényi DP accountant that handles temporal and spatial coupling, and proves a bound for sparsified Gaussian matrix factorization. Empirically, the approach yields at least compression improvements in DP-SGD across multiple FL tasks, while maintaining competitive accuracy, demonstrating substantial practical impact for communication-efficient private learning.

Abstract

We study mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for geometry and rely on random rotation or Kashin's representation to adapt to geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers. In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.
Paper Structure (26 sections, 5 theorems, 25 equations, 8 figures, 3 algorithms)

This paper contains 26 sections, 5 theorems, 25 equations, 8 figures, 3 algorithms.

Key Result

Theorem 4.1

Let $\bm{g}_1, ..., \bm{g}_n \in \mathbb{S}^{d-1}(\Delta_2)$ (i.e., $\left\lVert \bm{g}_i \right\rVert_2 \leq \Delta_2$), and $\left\lVert \bm{g}_i\right\rVert_\infty \leq \Delta_\infty$ for all $i \in [n]$. Let $\hat{\mu}_\mathsf{CSGM}(\bm{g}^n)$ be defined as in eq:CSGM with $\bm{s}_1, ...,\bm{s}_

Figures (8)

  • Figure 1: Noise multipliers (defined as $\sigma/\Delta_2$) of CSGM and GM with $\varepsilon = 5.0$, $\delta=10^{-8}$ and $\gamma=0.01$. On the left, we fix the sparsification rate $\gamma=0.01$. The numerical result indicates that as the ratio decreases, the noise multiplier of CSGM converges to that of the GM. Equivalently, this implies that $\varepsilon_{\mathsf{CSGM}}(\alpha) \rightarrow \varepsilon_{\mathsf{GM}}(\alpha)$ if one fixes the MSEs of both schemes. On the right, we fix the $\Delta_2/\Delta_\infty$ ratio to be $1000$ and plot the noise multipliers.
  • Figure 2: Accuracy of GM and CSGM, with $\delta = 10^{-5}$ for F-EMNIST and $\delta = 10^{-6}$ for SONWP. The resulting $\Delta_\infty/\Delta_2$ value is $6.4\cdot 10^{-3}$ for F-EMNIST and $3.3\cdot 10^{-3}$ for SONWP.
  • Figure 3: Accuracy of MF and SGMF, with $\delta = 10^{-5}$, cohort size $n= 100$, clipped norm $\Delta_2 = 1.0$, and server learning rate $0.1$.
  • Figure 4: Accuracy of GM and CSGM, with $\delta = 10^{-5}$ and cohort size $1000$. The $\Delta_\infty/\Delta_2$ ratio is $6.4\cdot 10^{-3}$ for F-EMNIST.
  • Figure 5: Accuracy of GM and CSGM, with $\delta = 10^{-5}$ and cohort size $100$. The $\Delta_\infty/\Delta_2$ ratio is $6.4\cdot 10^{-3}$ for F-EMNIST.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 3.1: Differential Privacy dwork2006calibrating
  • Definition 3.2: Neighboring datasets
  • Remark 3.3
  • Theorem 4.1
  • Corollary 4.2
  • Lemma 4.3
  • Theorem 5.1
  • Lemma 5.2