Sample-Optimal Private Regression in Polynomial Time
Prashanti Anderson, Ainesh Bakshi, Mahbod Majid, Stefan Tiegel
TL;DR
This work resolves the open problem of achieving sample-optimal private regression for Gaussian covariates with unknown covariance in polynomial time under both pure and approximate differential privacy. The authors develop a two-pronged approach: (i) a sum-of-squares based robust regression algorithm that attains optimal robustness against outliers under Gaussian assumptions, and (ii) a geometry-aware privacy framework that preserves the relevant covariance geometry in the exponential mechanism, eliminating the need to privately isotropize the data. They prove tight sample-complexity bounds, showing that any improvement would contradict statistical-query or information-theoretic lower bounds, and establish fault-tolerant guarantees in the presence of a fraction of adversarial corruptions. The framework also yields a sample-optimal covariance-aware mean estimation method, illustrating the broader applicability of their resilience-to-privacy reduction. Overall, the paper advances private statistical learning by delivering the first efficient, sample-optimal private regression algorithm and by extending robust-private techniques to covariance-aware settings with strong theoretical optimality guarantees.
Abstract
We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.
