SPriFed-OMP: A Differentially Private Federated Learning Algorithm for Sparse Basis Recovery
Ajinkya Kiran Mulay, Xiaojun Lin
TL;DR
This work tackles sparse basis recovery in high-dimensional Federated Learning under differential privacy. It introduces SPriFed-OMP, which adapts Orthogonal Matching Pursuit to DP-FL via NoisySMPC, enabling exact support recovery with $n = O(\sqrt{p})$ under RIP; an enhanced gradient-privatized variant SPriFed-OMP-GRAD further improves utility, especially with gradient clipping. Theoretical results bound estimation error and empirical risk to $O\left(\sqrt{\frac{s \log s}{n}}\right)$ and $O\left(\frac{s \log s}{n}\right)$ respectively, independent of the ambient dimension $p$, and empirical results on synthetic and real data show substantial gains over DP-SGD and related baselines. Collectively, the methods offer a practical approach to private, high-dimensional sparse recovery in FL with provable guarantees and favorable privacy-utility trade-offs.
Abstract
Sparse basis recovery is a classical and important statistical learning problem when the number of model dimensions $p$ is much larger than the number of samples $n$. However, there has been little work that studies sparse basis recovery in the Federated Learning (FL) setting, where the client data's differential privacy (DP) must also be simultaneously protected. In particular, the performance guarantees of existing DP-FL algorithms (such as DP-SGD) will degrade significantly when $p \gg n$, and thus, they will fail to learn the true underlying sparse model accurately. In this work, we develop a new differentially private sparse basis recovery algorithm for the FL setting, called SPriFed-OMP. SPriFed-OMP converts OMP (Orthogonal Matching Pursuit) to the FL setting. Further, it combines SMPC (secure multi-party computation) and DP to ensure that only a small amount of noise needs to be added in order to achieve differential privacy. As a result, SPriFed-OMP can efficiently recover the true sparse basis for a linear model with only $n = O(\sqrt{p})$ samples. We further present an enhanced version of our approach, SPriFed-OMP-GRAD based on gradient privatization, that improves the performance of SPriFed-OMP. Our theoretical analysis and empirical results demonstrate that both SPriFed-OMP and SPriFed-OMP-GRAD terminate in a small number of steps, and they significantly outperform the previous state-of-the-art DP-FL solutions in terms of the accuracy-privacy trade-off.
