Table of Contents
Fetching ...

A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable Input

Bar Mahpud, Or Sheffet

TL;DR

The paper tackles private estimation of the second moment matrix Σ under zero-Concentrated Differential Privacy by introducing a subsamplability framework and a recursive private estimator RecDPSME. The method builds on a baseline subsample-and-aggregate approach and then iteratively shrinks large-eigenvalue directions while controlling outliers, achieving a (1±γ) spectral approximation with high probability. It applies to both distributional inputs and contaminated data, providing concrete sample-complexity bounds and demonstrating robust performance for heavy-tailed distributions and mixtures with outliers. The work situates its contributions relative to prior private covariance estimators, notably offering improved tolerance to high-leverage points and outliers while maintaining strong privacy-utility guarantees. Overall, the framework enables accurate private second-moment estimation in challenging high-dimensional settings where conventional DP methods struggle with outliers and ill-conditioned spectra.

Abstract

We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input $(m,α,β)$-subsamplable if a random subsample of size $m$ (or larger) preserves w.p $\geq 1-β$ the spectral structure of the original second moment matrix up to a multiplicative factor of $1\pm α$. Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al 2019, that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p. the accuracy of the second moment estimation upto an arbitrary factor of $(1\pmγ)$. We then show how to apply our algorithm to approximate the second moment matrix of a distribution $\mathcal{D}$, even when a noticeable fraction of the input are outliers.

A Private Approximation of the 2nd-Moment Matrix of Any Subsamplable Input

TL;DR

The paper tackles private estimation of the second moment matrix Σ under zero-Concentrated Differential Privacy by introducing a subsamplability framework and a recursive private estimator RecDPSME. The method builds on a baseline subsample-and-aggregate approach and then iteratively shrinks large-eigenvalue directions while controlling outliers, achieving a (1±γ) spectral approximation with high probability. It applies to both distributional inputs and contaminated data, providing concrete sample-complexity bounds and demonstrating robust performance for heavy-tailed distributions and mixtures with outliers. The work situates its contributions relative to prior private covariance estimators, notably offering improved tolerance to high-leverage points and outliers while maintaining strong privacy-utility guarantees. Overall, the framework enables accurate private second-moment estimation in challenging high-dimensional settings where conventional DP methods struggle with outliers and ill-conditioned spectra.

Abstract

We study the problem of differentially private second moment estimation and present a new algorithm that achieve strong privacy-utility trade-offs even for worst-case inputs under subsamplability assumptions on the data. We call an input -subsamplable if a random subsample of size (or larger) preserves w.p the spectral structure of the original second moment matrix up to a multiplicative factor of . Building upon subsamplability, we give a recursive algorithmic framework similar to Kamath et al 2019, that abides zero-Concentrated Differential Privacy (zCDP) while preserving w.h.p. the accuracy of the second moment estimation upto an arbitrary factor of . We then show how to apply our algorithm to approximate the second moment matrix of a distribution , even when a noticeable fraction of the input are outliers.

Paper Structure

This paper contains 30 sections, 20 theorems, 69 equations, 4 algorithms.

Key Result

Theorem 2.4

If a randomized algorithm $\mathcal{A}$ satisfies $\rho$-zero-concentrated differential privacy ($\rho$-zCDP), then $\mathcal{A}$ also satisfies $(\epsilon, \delta)$-differential privacy for any $\delta > 0$, where: $\epsilon = \rho + \sqrt{2\rho \ln\left(\frac{1}{\delta}\right)}$.

Theorems & Definitions (45)

  • Definition 1.1
  • Definition 2.2: Differential Privacy Dwork06
  • Definition 2.3: Zero-Concentrated Differential Privacy (zCDP) bun2016concentrateddifferentialprivacysimplifications
  • Theorem 2.4: bun2016concentrateddifferentialprivacysimplifications
  • Theorem 2.5: Composition Theorem for $\rho$-zCDP
  • Theorem 3.1
  • Definition 3.2
  • Theorem 3.3
  • Lemma 3.4
  • Corollary 3.5
  • ...and 35 more