Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

Bo Li; Wei Wang; Peng Ye

Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

Bo Li, Wei Wang, Peng Ye

TL;DR

This work analyzes pure differential privacy in agnostic learning, presenting near-optimal item-level sample complexity bounds and tighter user-level bounds than prior work. The authors develop a surrogate-error learning approach for item-level DP and extend it to user-level DP by exploiting a total-variation amplification of Binomial distributions, achieving improved user counts. They also obtain a near-optimal bound for learning thresholds under user-level privacy by a private median-based binary-search strategy. Collectively, the results reduce privacy-related data requirements, approaching non-private sample complexity in high-accuracy regimes and outlining directions for further tightening bounds for general concept classes.

Abstract

Machine Learning has made remarkable progress in a wide range of fields. In many scenarios, learning is performed on datasets involving sensitive information, in which privacy protection is essential for learning algorithms. In this work, we study pure private learning in the agnostic model -- a framework reflecting the learning process in practice. We examine the number of users required under item-level (where each user contributes one example) and user-level (where each user contributes multiple examples) privacy and derive several improved upper bounds. For item-level privacy, our algorithm achieves a near optimal bound for general concept classes. We extend this to the user-level setting, rendering a tighter upper bound than the one proved by Ghazi et al. (2023). Lastly, we consider the problem of learning thresholds under user-level privacy and present an algorithm with a nearly tight user complexity.

Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

TL;DR

Abstract

Paper Structure (24 sections, 18 theorems, 92 equations, 1 table, 2 algorithms)

This paper contains 24 sections, 18 theorems, 92 equations, 1 table, 2 algorithms.

Introduction
Our Results
Technical Overview
Related Work
Preliminaries
Learning
Probabilistic Representation Dimension
Tools from Differential Privacy
Item-Level Privacy
User-Level Privacy
Learning Thresholds with User-level Privacy
Conclusion
Additional Preliminaries
The Vapnik-Chervonenkis Dimension
Concentration Bounds
...and 9 more sections

Key Result

Lemma 2.4

For any concept class $\mathcal{C}$, we have $\mathrm{RepD}_{\alpha,\beta}(\mathcal{C}) = O(\log(1/\alpha)\cdot(\mathrm{RepD}(\mathcal{C}) + \log\log\log(1/\alpha) + \log\log(1/\beta)))$ for $0<\alpha,\beta<1$.

Theorems & Definitions (31)

Definition 2.1: Differential Privacy dwork2006calibratingdwork2006our
Definition 2.2: Agnostic Learning
Definition 2.3: Probabilistic Representation Dimension
Lemma 2.4: Boosting Probabilistic Representation
Lemma 2.5: The Laplace Mechanism
Lemma 2.6: The Exponential Mechanism
Lemma 3.1: VC Agnostic Generalization Bound
Lemma 3.2: VC Realizable Generalization Bound
Lemma 3.3: shalev2014understanding
Theorem 3.4
...and 21 more

Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

TL;DR

Abstract

Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (31)