Table of Contents
Fetching ...

Private Linear Regression with Differential Privacy and PAC Privacy

Hillary Yang, Yuntao Du

TL;DR

This work analyzes privacy-preserving linear regression under two paradigms: differential privacy (DP) with budget $( ext{ε}, ext{δ})$ and PAC Privacy, which bounds the adversary's posterior via mutual information $MI$. It introduces PAC-LR, an anisotropic-noise based PAC privacy method that leverages SVD projections to reduce sensitivity and add noise directly to model weights, enabling a fair comparison with DPSGD-LR. Through experiments on three real-world datasets, PAC-LR frequently outperforms DPSGD-LR, particularly under stringent privacy guarantees, and the study highlights the critical roles of data normalization and regularization in both approaches. The results offer practical guidance on privacy-utility tradeoffs in private linear regression and point to future work in broadening DP method comparisons, improving sampling efficiency for PAC privacy, and further exploring the role of regularization in shaping anisotropic noise. In short, the paper demonstrates that PAC privacy can yield robust utility for linear models in settings with tight privacy constraints, with actionable techniques like anisotropic noise via SVD aiding deployment.

Abstract

Linear regression is a fundamental tool for statistical analysis, which has motivated the development of linear regression methods that satisfy provable privacy guarantees so that the learned model reveals little about any one data point used to construct it. Most existing privacy-preserving linear regression methods rely on the well-established framework of differential privacy, while the newly proposed PAC Privacy has not yet been explored in this context. In this paper, we systematically compare linear regression models trained with differential privacy and PAC privacy across three real-world datasets, observing several key findings that impact the performance of privacy-preserving linear regression.

Private Linear Regression with Differential Privacy and PAC Privacy

TL;DR

This work analyzes privacy-preserving linear regression under two paradigms: differential privacy (DP) with budget and PAC Privacy, which bounds the adversary's posterior via mutual information . It introduces PAC-LR, an anisotropic-noise based PAC privacy method that leverages SVD projections to reduce sensitivity and add noise directly to model weights, enabling a fair comparison with DPSGD-LR. Through experiments on three real-world datasets, PAC-LR frequently outperforms DPSGD-LR, particularly under stringent privacy guarantees, and the study highlights the critical roles of data normalization and regularization in both approaches. The results offer practical guidance on privacy-utility tradeoffs in private linear regression and point to future work in broadening DP method comparisons, improving sampling efficiency for PAC privacy, and further exploring the role of regularization in shaping anisotropic noise. In short, the paper demonstrates that PAC privacy can yield robust utility for linear models in settings with tight privacy constraints, with actionable techniques like anisotropic noise via SVD aiding deployment.

Abstract

Linear regression is a fundamental tool for statistical analysis, which has motivated the development of linear regression methods that satisfy provable privacy guarantees so that the learned model reveals little about any one data point used to construct it. Most existing privacy-preserving linear regression methods rely on the well-established framework of differential privacy, while the newly proposed PAC Privacy has not yet been explored in this context. In this paper, we systematically compare linear regression models trained with differential privacy and PAC privacy across three real-world datasets, observing several key findings that impact the performance of privacy-preserving linear regression.

Paper Structure

This paper contains 19 sections, 13 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Performance comparison (i.e., RMSE) of private linear regression using DPSGD and PAC privacy across three datasets. The posterior success rate is used to bridge the connection between differential privacy and PAC privacy (see Table \ref{['tab:connect_dp_pac']} for details).
  • Figure 2: Performance comparison (i.e., $R^2$) of private linear regression using DPSGD-LR and PAC-LR across three datasets. The posterior success rate bridges the connection between differential privacy and PAC privacy (see Table \ref{['tab:connect_dp_pac']} for details).

Theorems & Definitions (2)

  • Definition 1: $(\epsilon, \delta)$-differential privacy dwork2006differential
  • Definition 2: $(\delta, \rho, D)$-PAC Privacy xiao2023pacCCS2024