Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy
Bo Li, Wei Wang, Peng Ye
TL;DR
This work analyzes pure differential privacy in agnostic learning, presenting near-optimal item-level sample complexity bounds and tighter user-level bounds than prior work. The authors develop a surrogate-error learning approach for item-level DP and extend it to user-level DP by exploiting a total-variation amplification of Binomial distributions, achieving improved user counts. They also obtain a near-optimal bound for learning thresholds under user-level privacy by a private median-based binary-search strategy. Collectively, the results reduce privacy-related data requirements, approaching non-private sample complexity in high-accuracy regimes and outlining directions for further tightening bounds for general concept classes.
Abstract
Machine Learning has made remarkable progress in a wide range of fields. In many scenarios, learning is performed on datasets involving sensitive information, in which privacy protection is essential for learning algorithms. In this work, we study pure private learning in the agnostic model -- a framework reflecting the learning process in practice. We examine the number of users required under item-level (where each user contributes one example) and user-level (where each user contributes multiple examples) privacy and derive several improved upper bounds. For item-level privacy, our algorithm achieves a near optimal bound for general concept classes. We extend this to the user-level setting, rendering a tighter upper bound than the one proved by Ghazi et al. (2023). Lastly, we consider the problem of learning thresholds under user-level privacy and present an algorithm with a nearly tight user complexity.
