Table of Contents
Fetching ...

Sample-Efficient Private Learning of Mixtures of Gaussians

Hassan Ashtiani, Mahbod Majid, Shyam Narayanan

TL;DR

The first optimal bound for privately learning mixtures of $k$ univariate (i.e., $1-dimensional) Gaussians is given, which is provably optimal when $d$ is much larger than $k^2$.

Abstract

We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$ samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly $k^2 d^4$ samples) and is provably optimal when $d$ is much larger than $k^2$. Moreover, we give the first optimal bound for privately learning mixtures of $k$ univariate (i.e., $1$-dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components $k$, whereas the previous best sample complexity [AAL21] was quadratic in $k$. Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.

Sample-Efficient Private Learning of Mixtures of Gaussians

TL;DR

The first optimal bound for privately learning mixtures of univariate (i.e., dk^2$.

Abstract

We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly samples suffice to learn a mixture of arbitrary -dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly samples) and is provably optimal when is much larger than . Moreover, we give the first optimal bound for privately learning mixtures of univariate (i.e., -dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components , whereas the previous best sample complexity [AAL21] was quadratic in . Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.

Paper Structure

This paper contains 48 sections, 49 theorems, 61 equations.

Key Result

Theorem 1.4

For any $\alpha, \varepsilon, \delta \in (0,1), k, d \in \mathbb N$, there exists an inefficient $(\varepsilon, \delta)$-DP algorithm that can learn a mixture of $k$ arbitrary full-dimensional Gaussians in $d$ dimensions up to accuracy $\alpha$, using the following number of samples:

Theorems & Definitions (88)

  • Definition 1.1: Learning GMMs
  • Definition 1.2: Differential Privacy (DP) dwork2006calibratingdwork2006our
  • Definition 1.3: Privately learning GMMs
  • Theorem 1.4
  • Theorem 1.5
  • Theorem 1.6
  • Theorem 2.1: informal - see \ref{['thm:approx_dp_general_main']} for the formal statement
  • Theorem A.1: Advanced Composition Theorem dworkrothbook
  • Definition A.2: Truncated Laplace Distribution
  • Lemma A.3: Truncated Laplace Mechanism geng2020tight
  • ...and 78 more