Table of Contents
Fetching ...

Private Sketches for Linear Regression

Shrutimoy Das, Debanuj Nayak, Anirban Dasgupta

TL;DR

The paper tackles differential privacy for linear regression by releasing differentially private sketches of the data rather than private solutions. It develops DP sketching methods for both l2 and l1 regression: JL-based private sketches (with and without regularization) and CountSketch-based approaches for l2, plus an oblivious sketching framework for l1 that yields regularized problems. A key contribution is deriving bounds on the induced regularization parameters that privacy enforces, clarifying when unregularized solutions are possible. This sketch-then-solve paradigm enables reuse of standard regressor solvers while preserving privacy, with potential computational and privacy advantages for downstream tasks.

Abstract

Linear regression is frequently applied in a variety of domains, some of which might contain sensitive information. This necessitates that the application of these methods does not reveal private information. Differentially private (DP) linear regression methods, developed for this purpose, compute private estimates of the solution. These techniques typically involve computing a noisy version of the solution vector. Instead, we propose releasing private sketches of the datasets, which can then be used to compute an approximate solution to the regression problem. This is motivated by the \emph{sketch-and-solve} paradigm, where the regression problem is solved on a smaller sketch of the dataset instead of on the original problem space. The solution obtained on the sketch can also be shown to have good approximation guarantees to the original problem. Various sketching methods have been developed for improving the computational efficiency of linear regression problems under this paradigm. We adopt this paradigm for the purpose of releasing private sketches of the data. We construct differentially private sketches for the problems of least squares regression, as well as least absolute deviations regression. We show that the privacy constraints lead to sketched versions of regularized regression. We compute the bounds on the regularization parameter required for guaranteeing privacy. The availability of these private sketches facilitates the application of commonly available solvers for regression, without the risk of privacy leakage.

Private Sketches for Linear Regression

TL;DR

The paper tackles differential privacy for linear regression by releasing differentially private sketches of the data rather than private solutions. It develops DP sketching methods for both l2 and l1 regression: JL-based private sketches (with and without regularization) and CountSketch-based approaches for l2, plus an oblivious sketching framework for l1 that yields regularized problems. A key contribution is deriving bounds on the induced regularization parameters that privacy enforces, clarifying when unregularized solutions are possible. This sketch-then-solve paradigm enables reuse of standard regressor solvers while preserving privacy, with potential computational and privacy advantages for downstream tasks.

Abstract

Linear regression is frequently applied in a variety of domains, some of which might contain sensitive information. This necessitates that the application of these methods does not reveal private information. Differentially private (DP) linear regression methods, developed for this purpose, compute private estimates of the solution. These techniques typically involve computing a noisy version of the solution vector. Instead, we propose releasing private sketches of the datasets, which can then be used to compute an approximate solution to the regression problem. This is motivated by the \emph{sketch-and-solve} paradigm, where the regression problem is solved on a smaller sketch of the dataset instead of on the original problem space. The solution obtained on the sketch can also be shown to have good approximation guarantees to the original problem. Various sketching methods have been developed for improving the computational efficiency of linear regression problems under this paradigm. We adopt this paradigm for the purpose of releasing private sketches of the data. We construct differentially private sketches for the problems of least squares regression, as well as least absolute deviations regression. We show that the privacy constraints lead to sketched versions of regularized regression. We compute the bounds on the regularization parameter required for guaranteeing privacy. The availability of these private sketches facilitates the application of commonly available solvers for regression, without the risk of privacy leakage.

Paper Structure

This paper contains 10 sections, 4 theorems, 26 equations, 2 algorithms.

Key Result

Theorem 1

Let $S \in \{-1,0,1\}^{r \times (n + r \log r)}$ be the sketching matrix which has been described above and let $r = poly(d, \mu^{-1}).$ Also, let $\eta$ be a $r \log r\times(d+1)$ matrix where the $i$th row $\tilde{\eta}_i \sim \mathcal{N}(0, \frac{8B^2 \ln(1.25/\delta)}{\epsilon^2}I_{d+1}).$ Then, with constant probability.

Theorems & Definitions (14)

  • Definition 1: Differential Privacy dwork2006privacy
  • Definition 2: $\ell_2$ sensitivity
  • Definition 3: Gaussian Mechanism dwork2006privacy
  • Definition 4: The Johnson-Lindenstrauss Lemma jlt1984jltrp2006sarlos
  • Definition 4: The Johnson-Lindenstrauss Lemma jlt1984jltrp2006sarlos
  • Definition 5: Subspace Embedding input2012clarkson
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • ...and 4 more