Oracle-Efficient Differentially Private Learning with Public Data
Adam Block, Mark Bun, Rathin Desai, Abhishek Shetty, Steven Wu
TL;DR
The paper develops oracle-efficient, semi-private learning algorithms that leverage public unlabeled data to enable differentially private learning when the target concept class is learnable non-privately. By combining an ERM-based initial estimator with a noise-perturbation release step, and exploiting a σ-smooth relation between private and public distributions, the authors obtain DP guarantees and PAC learning guarantees with polynomial dependence on Gaussian complexity and VC dimension. They introduce a general algorithm and a convex-special variant with improved sample complexity, and a specialized, privacy-preserving classifier RRSPM that achieves significantly better rates in the binary classification setting. The work unifies oracle-efficient private learning with smoothed learning and domain adaptation, offering practical private learners that scale with standard complexity measures and tolerate distribution shifts between private and public data. Overall, the results show that private learnability with public data is feasible computationally whenever non-private learning is possible, even under privacy constraints.
Abstract
Due to statistical lower bounds on the learnability of many function classes under privacy constraints, there has been recent interest in leveraging public data to improve the performance of private learning algorithms. In this model, algorithms must always guarantee differential privacy with respect to the private samples while also ensuring learning guarantees when the private data distribution is sufficiently close to that of the public data. Previous work has demonstrated that when sufficient public, unlabelled data is available, private learning can be made statistically tractable, but the resulting algorithms have all been computationally inefficient. In this work, we present the first computationally efficient, algorithms to provably leverage public data to learn privately whenever a function class is learnable non-privately, where our notion of computational efficiency is with respect to the number of calls to an optimization oracle for the function class. In addition to this general result, we provide specialized algorithms with improved sample complexities in the special cases when the function class is convex or when the task is binary classification.
