Table of Contents
Fetching ...

Differentially Private Truncation of Unbounded Data via Public Second Moments

Zilong Cao, Xuan Bi, Hai Zhang

TL;DR

This work proposes Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size.

Abstract

Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.

Differentially Private Truncation of Unbounded Data via Public Second Moments

TL;DR

This work proposes Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size.

Abstract

Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
Paper Structure (43 sections, 27 theorems, 139 equations, 10 figures, 5 algorithms)

This paper contains 43 sections, 27 theorems, 139 equations, 10 figures, 5 algorithms.

Key Result

Lemma 2.1

A mechanism $\mathcal{M}$ satisfies $\mu$-GDP if and only if it is $(\epsilon,\delta(\epsilon))$-DP for $\forall \ \epsilon \geqslant 0$, where $\delta(\epsilon) = \Upphi(-\frac{\epsilon}{\mu} + \frac{\mu}{2})-e^\epsilon \Upphi(-\frac{\epsilon}{\mu} - \frac{\mu}{2})$.

Figures (10)

  • Figure 1: Simulations on DP-PMTRR, DP-RR and DP-GD.
  • Figure 2: Simulations with different private data sizes and privacy parameters.
  • Figure 3: Simulations with different regularization parameters.
  • Figure 4: Real-world experiments on DP-PMTRR, DP-RR and DP-GD.
  • Figure 5: Real-world experiments with different private data sizes and privacy parameters.
  • ...and 5 more figures

Theorems & Definitions (55)

  • Definition 2.1: Differential Privacy dwork2014algorithmic
  • Definition 2.2: $f$-Differential Privacy dong2022gaussian
  • Definition 2.3: Guassian Differential Privacy dong2022gaussian
  • Lemma 2.1: Corollary 1 in dong2022gaussian
  • Theorem 2.1: The n-fold composition, Corollary 2 in dong2022gaussian
  • Definition 2.4: Sensitivity
  • Theorem 2.2: Gaussian Mechanism in dong2022gaussian
  • Theorem 3.1: Bound the second-moment matrix
  • Corollary 3.1: Untility of truncation
  • Remark 3.1
  • ...and 45 more