The Cost of Shuffling in Private Gradient Based Optimization
Shuli Jiang, Pranay Sharma, Zhiwei Steven Wu, Gauri Joshi
TL;DR
The paper analyzes differentially private convex ERM solved via shuffled gradient methods and shows that DP-ShuffleG suffers worse empirical excess risk than DP-SGD due to reduced randomness under privacy. To mitigate this, it introduces a Generalized Shuffled Gradient Framework with surrogate objectives, adaptive noise, and a dissimilarity measure, enabling convergence analysis that accounts for surrogate-epoch differences. It then proposes Interleaved-ShuffleG, which interleaves private and public data within each epoch to leverage cheap public data while maintaining privacy, and provides a rigorous convergence/privacy treatment via privacy amplification by iteration (PABI) and Stein's lemma. Empirical results on diverse tasks demonstrate that Interleaved-ShuffleG consistently achieves lower empirical excess risk than DP-ShuffleG and public-data baselines, especially under strong privacy constraints, highlighting a practical route to improve private shuffled optimization.
Abstract
We consider the problem of differentially private (DP) convex empirical risk minimization (ERM). While the standard DP-SGD algorithm is theoretically well-established, practical implementations often rely on shuffled gradient methods that traverse the training data sequentially rather than sampling with replacement in each iteration. Despite their widespread use, the theoretical privacy-accuracy trade-offs of private shuffled gradient methods (\textit{DP-ShuffleG}) remain poorly understood, leading to a gap between theory and practice. In this work, we leverage privacy amplification by iteration (PABI) and a novel application of Stein's lemma to provide the first empirical excess risk bound of \textit{DP-ShuffleG}. Our result shows that data shuffling results in worse empirical excess risk for \textit{DP-ShuffleG} compared to DP-SGD. To address this limitation, we propose \textit{Interleaved-ShuffleG}, a hybrid approach that integrates public data samples in private optimization. By alternating optimization steps that use private and public samples, \textit{Interleaved-ShuffleG} effectively reduces empirical excess risk. Our analysis introduces a new optimization framework with surrogate objectives, adaptive noise injection, and a dissimilarity metric, which can be of independent interest. Our experiments on diverse datasets and tasks demonstrate the superiority of \textit{Interleaved-ShuffleG} over several baselines.
