Advances in Differential Privacy and Differentially Private Machine Learning

Saswat Das; Subhankar Mishra

Advances in Differential Privacy and Differentially Private Machine Learning

Saswat Das, Subhankar Mishra

TL;DR

This survey consolidates advances in differential privacy and differentially private machine learning, emphasizing theory, novel DP variants (e.g., $Renyi\ DP$, CDP, and truncated CDP), and practical DP mechanisms. It details DP-ERM and DP-SGD as core learning paradigms, discusses PATE and federated approaches, and surveys industrial deployments (e.g., Google, Apple, Microsoft, Uber) to illustrate real-world DP adoption. A bibliometric analysis highlights the rapid growth of DPML research, underscoring the field’s increasing importance in both theory and practice. Overall, the paper situates DP and DPML as a mature but actively evolving framework, balancing privacy guarantees with utility in high-stakes data-driven settings.

Abstract

There has been an explosion of research on differential privacy (DP) and its various applications in recent years, ranging from novel variants and accounting techniques in differential privacy to the thriving field of differentially private machine learning (DPML) to newer implementations in practice, like those by various companies and organisations such as census bureaus. Most recent surveys focus on the applications of differential privacy in particular contexts like data publishing, specific machine learning tasks, analysis of unstructured data, location privacy, etc. This work thus seeks to fill the gap for a survey that primarily discusses recent developments in the theory of differential privacy along with newer DP variants, viz. Renyi DP and Concentrated DP, novel mechanisms and techniques, and the theoretical developments in differentially private machine learning in proper detail. In addition, this survey discusses its applications to privacy-preserving machine learning in practice and a few practical implementations of DP.

Advances in Differential Privacy and Differentially Private Machine Learning

TL;DR

This survey consolidates advances in differential privacy and differentially private machine learning, emphasizing theory, novel DP variants (e.g.,

, CDP, and truncated CDP), and practical DP mechanisms. It details DP-ERM and DP-SGD as core learning paradigms, discusses PATE and federated approaches, and surveys industrial deployments (e.g., Google, Apple, Microsoft, Uber) to illustrate real-world DP adoption. A bibliometric analysis highlights the rapid growth of DPML research, underscoring the field’s increasing importance in both theory and practice. Overall, the paper situates DP and DPML as a mature but actively evolving framework, balancing privacy guarantees with utility in high-stakes data-driven settings.

Abstract

Paper Structure (39 sections, 8 theorems, 22 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 39 sections, 8 theorems, 22 equations, 5 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Our Contribution
Definitions, Mechanisms, and Variants
Pure and Approximate Differential Privacy and Prerequisite Definitions
Basic Properties of $(\varepsilon,\delta)$-DP
Tighter Composition Bounds
Basic Mechanisms
Differentially Private Algorithms
Sparse Vector Technique (SVT)
Privacy Amplification via Subsampling
Privacy Amplification via Shuffling
Applications in Machine Learning
Differentially Private ERM
Output Perturbation
...and 24 more sections

Key Result

Theorem 1.2.3

Post-Processing Invariance Let $\mathcal{M}:\mathbb N^{|\mathcal{X}|}\to R$ be a randomised algorithm that is $(\varepsilon,\delta)-$differentially private. Let $f:R\to R'$ be an arbitrary randomised mapping. Then $f\circ\mathcal{M}:\mathbb N^{|\mathcal{X}|}\to R'$ is $(\epsilon,\delta)-$differentia

Figures (5)

Figure 1: Timeline of Important Definitions and Developments
Figure 2: Privacy Loss Graphs for Pure and Approximate Differential Privacy.
Figure 3: Diagram illustrating the working of PATE Papernot2018PATEScalable
Figure 4: An Overview of the Federated Learning Process. The central server provides the clients with a global, initial model. The clients train the model on their local data and send the result of their local training (local updates) back to the server. The server then aggregates the local updates and updates the global model.
Figure 5: Bar Graph showing the number of DP and DPML publications in each year; the 2022 statistics are as of June, 2022.

Theorems & Definitions (18)

Definition 1.2.1
Definition 1.2.2
Theorem 1.2.3
Theorem 1.2.4
Theorem 1.2.5
Definition 1.2.6
Theorem 1.2.7
Theorem 1.2.8
Definition 1.2.9
Definition 1.2.10
...and 8 more

Advances in Differential Privacy and Differentially Private Machine Learning

TL;DR

Abstract

Advances in Differential Privacy and Differentially Private Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (18)