Table of Contents
Fetching ...

Differentially Private Distribution Release of Gaussian Mixture Models via KL-Divergence Minimization

Hang Liu, Anna Scaglione, Sean Peisert

TL;DR

This work tackles the privacy risks of releasing Gaussian Mixture Model (GMM) parameters by adopting KL divergence as a unified utility metric and designing a differentially private (DP) distribution-release mechanism. It introduces a two-step pipeline that fits a GMM to data and then perturbs mixture weights, means, and covariances using a combination of Gaussian, Wishart, and discrete-mapping noise, all governed by a DP budget. A tractable KL-divergence–constrained optimization allocates privacy budgets and optimizes noise statistics via an alternating scheme, with closed-form DP bounds and a reduced-complexity implementation. Extensive experiments on synthetic and real datasets (Iris, AMI load data, MNIST) demonstrate superior utility/privacy trade-offs compared with baselines, validating KL divergence as an effective utility proxy for DP-GMM release and its practical impact for data sharing and synthetic data generation.

Abstract

Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simulation, and machine learning. However, recent research has shown that releasing GMM parameters poses significant privacy risks, potentially exposing sensitive information about the underlying data. In this paper, we address the challenge of releasing GMM parameters while ensuring differential privacy (DP) guarantees. Specifically, we focus on the privacy protection of mixture weights, component means, and covariance matrices. We propose to use Kullback-Leibler (KL) divergence as a utility metric to assess the accuracy of the released GMM, as it captures the joint impact of noise perturbation on all the model parameters. To achieve privacy, we introduce a DP mechanism that adds carefully calibrated random perturbations to the GMM parameters. Through theoretical analysis, we quantify the effects of privacy budget allocation and perturbation statistics on the DP guarantee, and derive a tractable expression for evaluating KL divergence. We formulate and solve an optimization problem to minimize the KL divergence between the released and original models, subject to a given $(ε, δ)$-DP constraint. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach achieves strong privacy guarantees while maintaining high utility.

Differentially Private Distribution Release of Gaussian Mixture Models via KL-Divergence Minimization

TL;DR

This work tackles the privacy risks of releasing Gaussian Mixture Model (GMM) parameters by adopting KL divergence as a unified utility metric and designing a differentially private (DP) distribution-release mechanism. It introduces a two-step pipeline that fits a GMM to data and then perturbs mixture weights, means, and covariances using a combination of Gaussian, Wishart, and discrete-mapping noise, all governed by a DP budget. A tractable KL-divergence–constrained optimization allocates privacy budgets and optimizes noise statistics via an alternating scheme, with closed-form DP bounds and a reduced-complexity implementation. Extensive experiments on synthetic and real datasets (Iris, AMI load data, MNIST) demonstrate superior utility/privacy trade-offs compared with baselines, validating KL divergence as an effective utility proxy for DP-GMM release and its practical impact for data sharing and synthetic data generation.

Abstract

Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simulation, and machine learning. However, recent research has shown that releasing GMM parameters poses significant privacy risks, potentially exposing sensitive information about the underlying data. In this paper, we address the challenge of releasing GMM parameters while ensuring differential privacy (DP) guarantees. Specifically, we focus on the privacy protection of mixture weights, component means, and covariance matrices. We propose to use Kullback-Leibler (KL) divergence as a utility metric to assess the accuracy of the released GMM, as it captures the joint impact of noise perturbation on all the model parameters. To achieve privacy, we introduce a DP mechanism that adds carefully calibrated random perturbations to the GMM parameters. Through theoretical analysis, we quantify the effects of privacy budget allocation and perturbation statistics on the DP guarantee, and derive a tractable expression for evaluating KL divergence. We formulate and solve an optimization problem to minimize the KL divergence between the released and original models, subject to a given -DP constraint. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach achieves strong privacy guarantees while maintaining high utility.

Paper Structure

This paper contains 21 sections, 26 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: Average KL divergence and confidence interval for the proposed method across $200$ Monte Carlo trials. The privacy level is represented by the value of $\epsilon$.
  • Figure 2: KL divergence versus the privacy level in terms of the value of $\epsilon$.
  • Figure 3: KL divergence versus the data size $N$ with $\epsilon=1$.
  • Figure 4: KL divergence versus the number of classes $K$, where the total data size is fixed to $N=1000$.
  • Figure 5: KL divergence versus the dimension of the data points $d$ with $K=5$ and $N=1000$.
  • ...and 8 more figures