Data Efficient Subset Training with Differential Privacy
Ninad Jayesh Gandhi, Moparthy Venkata Subrahmanya Sri Harsha
TL;DR
The paper tackles data-efficient training under differential privacy by adapting GLISTER to the private setting, creating GLISTER-DP that combines DP-SGD for inner training and a DP submodular exponential-mechanism for subset selection. It formalizes a two-level objective with a subset $S$ of size $|S|\le k$ and demonstrates how the privacy budget is split into $\varepsilon_g$ for training and $\varepsilon_{ss}$ for subset selection. Empirically, full training (FULL-DP) remains strongest, while RANDOM-DP often outperforms GLISTER-DP for $\varepsilon \in \{3,8\}$, indicating that the privacy budget allocated to subset search is too restrictive to yield high-quality subsets. The results imply practical private data-efficient training remains challenging, with the main implication being to allocate more budget to training rather than subset selection and to explore improved DP-based subset selection methods.
Abstract
Private machine learning introduces a trade-off between the privacy budget and training performance. Training convergence is substantially slower and extensive hyper parameter tuning is required. Consequently, efficient methods to conduct private training of models is thoroughly investigated in the literature. To this end, we investigate the strength of the data efficient model training methods in the private training setting. We adapt GLISTER (Killamsetty et al., 2021b) to the private setting and extensively assess its performance. We empirically find that practical choices of privacy budgets are too restrictive for data efficient training in the private setting.
