Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

Mohammad Afzali; Hassan Ashtiani; Christopher Liaw

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

Mohammad Afzali, Hassan Ashtiani, Christopher Liaw

TL;DR

This paper provides the first finite sample complexity upper bound for privately learning general Gaussian Mixtures without restrictive structural assumptions. It introduces a general reduction: if a base class admits a $(t,2\alpha/15)$-locally small $\alpha/15$-cover in total variation and is list-decodable, then its $k$-mixtures are privately learnable with poly$(k,d,1/\alpha,1/\varepsilon,\log(1/\delta))$ samples. Key innovations include a private common-member selector (PCMS), a component-wise distance $\kappa_{mix}$ for mixtures, and a compression-based list-decoding approach for Gaussians, together enabling privately learning GMMs with polynomial sample complexity. A locally small cover for Gaussians is constructed via ball-cover techniques around $N(0,I_d)$ and TV-distance bounds, enabling the private learning of GMMs despite the lack of a TV-locally-small cover for mixtures. The results bridge private density estimation and mixture modeling, offering a principled pathway to DP-learning of complex distribution classes, albeit without a computationally efficient algorithm in this work.

Abstract

We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\text{poly}(k,d,1/α,1/\varepsilon,\log(1/δ))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation distance $α$ while satisfying $(\varepsilon, δ)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

TL;DR

-locally small

-cover in total variation and is list-decodable, then its

-mixtures are privately learnable with poly

samples. Key innovations include a private common-member selector (PCMS), a component-wise distance

for mixtures, and a compression-based list-decoding approach for Gaussians, together enabling privately learning GMMs with polynomial sample complexity. A locally small cover for Gaussians is constructed via ball-cover techniques around

and TV-distance bounds, enabling the private learning of GMMs despite the lack of a TV-locally-small cover for mixtures. The results bridge private density estimation and mixture modeling, offering a principled pathway to DP-learning of complex distribution classes, albeit without a computationally efficient algorithm in this work.

Abstract

We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that

samples are sufficient to estimate a mixture of

Gaussians in

up to total variation distance

while satisfying

-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun et al., 2021) with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover (Aden-Ali et al., 2021b).

Paper Structure (20 sections, 24 theorems, 24 equations, 1 algorithm)

This paper contains 20 sections, 24 theorems, 24 equations, 1 algorithm.

Introduction
Preliminaries
Distribution learning and list decodable learning
Differential Privacy
Main Results
Technical Challenges and Contributions
Common Member Selection
Mixtures and Their Properties
Component-wise distance between mixtures
Dense mixtures
Proof of the Main Reduction
Privately Learning GMMs
List-decoding Gaussians using compression
A locally small cover for Gaussians
Learning GMMs
...and 5 more sections

Key Result

Theorem 2.9

Let $\alpha,\beta \in (0,1)$. Given a finite class of distributions $\mathcal{F}$, there is an algorithm that upon receiving $O(\frac{\log|\mathcal{F}|+\log(1/\beta)}{\alpha^2})$ i.i.d. samples from a distribution $g$, it returns an $\hat{f}\in \mathcal{F}$ such that $\mathop{\mathrm{d_{\textsc{TV}}

Theorems & Definitions (63)

Definition 2.1: $\kappa$-ball
Definition 2.2: $\alpha$-cover
Definition 2.3: Locally small cover bun2021private
Definition 2.4: $k$-mixtures
Definition 2.5: Unbounded Gaussians
Definition 2.6: List decodable learning under Huber's contamination
Definition 2.7: PAC learning
Remark 2.8
Theorem 2.9: Learning finite classes, Theorem 6.3 of devroye2001combinatorial
Definition 2.10: $(\varepsilon,\delta)$-Indistinguishable
...and 53 more

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

TL;DR

Abstract

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (63)