Table of Contents
Fetching ...

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

Ajinkya Kiran Mulay, Xiaojun Lin

TL;DR

This work tackles sparse basis recovery in high-dimensional Federated Learning under differential privacy. It introduces SPriFed-OMP, which adapts Orthogonal Matching Pursuit to DP-FL via NoisySMPC, enabling exact support recovery with $n = O(\sqrt{p})$ under RIP; an enhanced gradient-privatized variant SPriFed-OMP-GRAD further improves utility, especially with gradient clipping. Theoretical results bound estimation error and empirical risk to $O\left(\sqrt{\frac{s \log s}{n}}\right)$ and $O\left(\frac{s \log s}{n}\right)$ respectively, independent of the ambient dimension $p$, and empirical results on synthetic and real data show substantial gains over DP-SGD and related baselines. Collectively, the methods offer a practical approach to private, high-dimensional sparse recovery in FL with provable guarantees and favorable privacy-utility trade-offs.

Abstract

Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions $p$ is much larger than the number of samples $n$. However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when $p \gg n$, and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only $n = O(\sqrt{p})$ samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

TL;DR

This work tackles sparse basis recovery in high-dimensional Federated Learning under differential privacy. It introduces SPriFed-OMP, which adapts Orthogonal Matching Pursuit to DP-FL via NoisySMPC, enabling exact support recovery with under RIP; an enhanced gradient-privatized variant SPriFed-OMP-GRAD further improves utility, especially with gradient clipping. Theoretical results bound estimation error and empirical risk to and respectively, independent of the ambient dimension , and empirical results on synthetic and real data show substantial gains over DP-SGD and related baselines. Collectively, the methods offer a practical approach to private, high-dimensional sparse recovery in FL with provable guarantees and favorable privacy-utility trade-offs.

Abstract

Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions is much larger than the number of samples . However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when , and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.
Paper Structure (28 sections, 36 theorems, 95 equations, 4 figures, 2 tables, 5 algorithms)

This paper contains 28 sections, 36 theorems, 95 equations, 4 figures, 2 tables, 5 algorithms.

Key Result

Lemma 1

[GDP mechanism (Theorem 2.7 from dong2019gaussian)] Let $f: \mathbb{R}^{n \times p} \rightarrow \mathbb{R}^p$ be some function computed over the dataset $\bm{X} \in \mathbb{R}^{n \times p}$. Then, the randomized Gaussian mechanism $\mathcal{M}(\bm{X}) = f(\bm{X}) + \eta$ is $\mu-GDP$ where $\eta \si

Figures (4)

  • Figure 1: Enhancement 2 Performance Intuition: Change in size of artifacts (correlations and gradient) measured by their norm over multiple iterations.
  • Figure 2: Test MSE is shown for both SPriFed-OMP and DP-SGD for privacy parameters $(4.94, 10^{-4})$. Figure (a) fixes the sample size and varies the model dimensions; Figure (b) fixes the model dimensions and varies the sample size. Measurements averaged over $3$ randomized simulation runs.
  • Figure 3: The number of basis correctly recovered by SPriFed-OMP for privacy parameters $(4.94, 10^{-4})$. We choose $s=5$. Figure (a) demonstrates basis recovery for model sparsity $p=2500$ over varying sample sizes. Figure (b) demonstrates basis recovery for model sparsity $p=10000$ over varying sample sizes. Measurements averaged over $3$ randomized simulation runs.
  • Figure 4: The number of basis correctly recovered by SPriFed-OMP for privacy parameters $(5.34, 10^{-4})$. We choose $s=10$, and figures (a) and (b) have an additive error with standard deviation $\sigma_{\varepsilon} = 0.001$. Figure (a) demonstrates basis recovery for model sparsity $p=2500$ over varying sample sizes. Figure (b) demonstrates basis recovery for model sparsity $p=10000$ over varying sample sizes. Figure (c) varies $p=20000$ and $p=40000$ while increasing the additive error's standard deviation to $\sigma_{\varepsilon} = 0.1$. Measurements averaged over $3$ randomized simulation runs.

Theorems & Definitions (44)

  • Definition 2.1
  • Definition 4.1: $L_2$ sensitivity of a function
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Theorem 6
  • ...and 34 more