SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

Ajinkya Kiran Mulay; Xiaojun Lin

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

Ajinkya Kiran Mulay, Xiaojun Lin

TL;DR

This work tackles sparse basis recovery in high-dimensional Federated Learning under differential privacy. It introduces SPriFed-OMP, which adapts Orthogonal Matching Pursuit to DP-FL via NoisySMPC, enabling exact support recovery with $n = O(\sqrt{p})$ under RIP; an enhanced gradient-privatized variant SPriFed-OMP-GRAD further improves utility, especially with gradient clipping. Theoretical results bound estimation error and empirical risk to $O\left(\sqrt{\frac{s \log s}{n}}\right)$ and $O\left(\frac{s \log s}{n}\right)$ respectively, independent of the ambient dimension $p$, and empirical results on synthetic and real data show substantial gains over DP-SGD and related baselines. Collectively, the methods offer a practical approach to private, high-dimensional sparse recovery in FL with provable guarantees and favorable privacy-utility trade-offs.

Abstract

Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions $p$ is much larger than the number of samples $n$. However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when $p \gg n$, and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only $n = O(\sqrt{p})$ samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

TL;DR

under RIP; an enhanced gradient-privatized variant SPriFed-OMP-GRAD further improves utility, especially with gradient clipping. Theoretical results bound estimation error and empirical risk to

and

respectively, independent of the ambient dimension

, and empirical results on synthetic and real data show substantial gains over DP-SGD and related baselines. Collectively, the methods offer a practical approach to private, high-dimensional sparse recovery in FL with provable guarantees and favorable privacy-utility trade-offs.

Abstract

Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions

is much larger than the number of samples

. However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when

, and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only

samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.

Paper Structure (28 sections, 36 theorems, 95 equations, 4 figures, 2 tables, 5 algorithms)

This paper contains 28 sections, 36 theorems, 95 equations, 4 figures, 2 tables, 5 algorithms.

Introduction
Related Work
System Model
The SPriFed-OMP Algorithm
The SPriFed-OMP-GRAD Algorithm
Privacy Analysis
Accuracy of Private Orthogonal Matching Pursuit
Proof Sketch of Theorems \ref{['theorem:performance-SPriFed-OMP']} and \ref{['theorem:performance-SPriFed-OMP-GRAD']}
Empirical Results
Warm-up Experiments: Intuition behind our proposed enhancements
Performance Comparison: Synthetic Data Sets
Performance Comparison: More Realistic Data Sets
Conclusion
Differential Privacy
Compressed Sensing Theory
...and 13 more sections

Key Result

Lemma 1

[GDP mechanism (Theorem 2.7 from dong2019gaussian)] Let $f: \mathbb{R}^{n \times p} \rightarrow \mathbb{R}^p$ be some function computed over the dataset $\bm{X} \in \mathbb{R}^{n \times p}$. Then, the randomized Gaussian mechanism $\mathcal{M}(\bm{X}) = f(\bm{X}) + \eta$ is $\mu-GDP$ where $\eta \si

Figures (4)

Figure 1: Enhancement 2 Performance Intuition: Change in size of artifacts (correlations and gradient) measured by their norm over multiple iterations.
Figure 2: Test MSE is shown for both SPriFed-OMP and DP-SGD for privacy parameters $(4.94, 10^{-4})$. Figure (a) fixes the sample size and varies the model dimensions; Figure (b) fixes the model dimensions and varies the sample size. Measurements averaged over $3$ randomized simulation runs.
Figure 3: The number of basis correctly recovered by SPriFed-OMP for privacy parameters $(4.94, 10^{-4})$. We choose $s=5$. Figure (a) demonstrates basis recovery for model sparsity $p=2500$ over varying sample sizes. Figure (b) demonstrates basis recovery for model sparsity $p=10000$ over varying sample sizes. Measurements averaged over $3$ randomized simulation runs.
Figure 4: The number of basis correctly recovered by SPriFed-OMP for privacy parameters $(5.34, 10^{-4})$. We choose $s=10$, and figures (a) and (b) have an additive error with standard deviation $\sigma_{\varepsilon} = 0.001$. Figure (a) demonstrates basis recovery for model sparsity $p=2500$ over varying sample sizes. Figure (b) demonstrates basis recovery for model sparsity $p=10000$ over varying sample sizes. Figure (c) varies $p=20000$ and $p=40000$ while increasing the additive error's standard deviation to $\sigma_{\varepsilon} = 0.1$. Measurements averaged over $3$ randomized simulation runs.

Theorems & Definitions (44)

Definition 2.1
Definition 4.1: $L_2$ sensitivity of a function
Lemma 1
Lemma 2
Lemma 3
Lemma 4
proof
Lemma 5
proof
Theorem 6
...and 34 more

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

TL;DR

Abstract

SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (44)