Table of Contents
Fetching ...

Sample-Optimal Private Regression in Polynomial Time

Prashanti Anderson, Ainesh Bakshi, Mahbod Majid, Stefan Tiegel

TL;DR

This work resolves the open problem of achieving sample-optimal private regression for Gaussian covariates with unknown covariance in polynomial time under both pure and approximate differential privacy. The authors develop a two-pronged approach: (i) a sum-of-squares based robust regression algorithm that attains optimal robustness against outliers under Gaussian assumptions, and (ii) a geometry-aware privacy framework that preserves the relevant covariance geometry in the exponential mechanism, eliminating the need to privately isotropize the data. They prove tight sample-complexity bounds, showing that any improvement would contradict statistical-query or information-theoretic lower bounds, and establish fault-tolerant guarantees in the presence of a fraction of adversarial corruptions. The framework also yields a sample-optimal covariance-aware mean estimation method, illustrating the broader applicability of their resilience-to-privacy reduction. Overall, the paper advances private statistical learning by delivering the first efficient, sample-optimal private regression algorithm and by extending robust-private techniques to covariance-aware settings with strong theoretical optimality guarantees.

Abstract

We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.

Sample-Optimal Private Regression in Polynomial Time

TL;DR

This work resolves the open problem of achieving sample-optimal private regression for Gaussian covariates with unknown covariance in polynomial time under both pure and approximate differential privacy. The authors develop a two-pronged approach: (i) a sum-of-squares based robust regression algorithm that attains optimal robustness against outliers under Gaussian assumptions, and (ii) a geometry-aware privacy framework that preserves the relevant covariance geometry in the exponential mechanism, eliminating the need to privately isotropize the data. They prove tight sample-complexity bounds, showing that any improvement would contradict statistical-query or information-theoretic lower bounds, and establish fault-tolerant guarantees in the presence of a fraction of adversarial corruptions. The framework also yields a sample-optimal covariance-aware mean estimation method, illustrating the broader applicability of their resilience-to-privacy reduction. Overall, the paper advances private statistical learning by delivering the first efficient, sample-optimal private regression algorithm and by extending robust-private techniques to covariance-aware settings with strong theoretical optimality guarantees.

Abstract

We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.

Paper Structure

This paper contains 100 sections, 61 theorems, 364 equations, 1 figure, 2 tables, 4 algorithms.

Key Result

Theorem 1.4

Let $\theta, \Sigma$ be such that $\lVert\theta\rVert_2 \leqslant R$ and $\Sigma \preceq L I$ for some $R,L$. Given $0< \alpha, \varepsilon <1$, and $n$$\eta$-corrupted samples (as defined in model:robust_regression) with parameters $\theta$ and $\Sigma$, there exists an $\varepsilon$-differentiall as long as $\alpha\geqslant \eta \log(1/\eta)$ and In the same setting,Without requiring the bound

Theorems & Definitions (139)

  • Definition 1.1: Pure Differential Privacy dwork2006calibrating
  • Definition 1.2: Strong Contamination
  • Theorem 1.4: Optimal Private Regression (informal, see \ref{['thm:main_private_regression', '']} )
  • Remark 1.5: On Optimality.
  • Theorem 1.6: Covariance-Aware Mean Estimation (informal \ref{['thm:main_private_cov_mean_est']})
  • Remark 1.7: On Optimality
  • Definition 4.1
  • Definition 4.2
  • Definition 4.5
  • Lemma 4.6
  • ...and 129 more