Table of Contents
Fetching ...

Differentially Private Iterative Screening Rules for Linear Regression

Amol Khanna, Fred Lu, Edward Raff

TL;DR

This work addresses sparse, linear regression under differential privacy by introducing a differentially private screening rule. After showing that an aggressive initial approach overscreens, the authors develop RNM-Screen, which combines privacy-budget redistribution and a per-iteration report-noisy-min selection to privately screen one coefficient at a time, achieving more controlled sparsity. Empirical results on synthetic and real-world data demonstrate that RNM-Screen reduces overscreening relative to ADP-Screen and can yield lower mean-squared error on larger datasets, with informative but dataset-dependent improvements in feature selection metrics. The study advances private sparse optimization by showing that screening-based methods can produce sparse, private models, and it outlines practical open problems such as utility bounds, budget-splitting guidelines, and extensions to broader privacy settings.

Abstract

Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data science. Over the past decade, screening rules have risen in popularity as a way to eliminate features when producing the sparse regression weights of $L_1$ models. However, despite the increasing need of privacy-preserving models for data analysis, to the best of our knowledge, no differentially private screening rule exists. In this paper, we develop the first private screening rule for linear regression. We initially find that this screening rule is too strong: it screens too many coefficients as a result of the private screening step. However, a weakened implementation of private screening reduces overscreening and improves performance.

Differentially Private Iterative Screening Rules for Linear Regression

TL;DR

This work addresses sparse, linear regression under differential privacy by introducing a differentially private screening rule. After showing that an aggressive initial approach overscreens, the authors develop RNM-Screen, which combines privacy-budget redistribution and a per-iteration report-noisy-min selection to privately screen one coefficient at a time, achieving more controlled sparsity. Empirical results on synthetic and real-world data demonstrate that RNM-Screen reduces overscreening relative to ADP-Screen and can yield lower mean-squared error on larger datasets, with informative but dataset-dependent improvements in feature selection metrics. The study advances private sparse optimization by showing that screening-based methods can produce sparse, private models, and it outlines practical open problems such as utility bounds, budget-splitting guidelines, and extensions to broader privacy settings.

Abstract

Linear -regularized models have remained one of the simplest and most effective tools in data science. Over the past decade, screening rules have risen in popularity as a way to eliminate features when producing the sparse regression weights of models. However, despite the increasing need of privacy-preserving models for data analysis, to the best of our knowledge, no differentially private screening rule exists. In this paper, we develop the first private screening rule for linear regression. We initially find that this screening rule is too strong: it screens too many coefficients as a result of the private screening step. However, a weakened implementation of private screening reduces overscreening and improves performance.

Paper Structure

This paper contains 19 sections, 6 theorems, 7 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

theorem 1

Under the conditions listed above, the sensitivity of Equation 1 when $f(\mathbf{Xw}) = \frac{1}{n} \left( \mathbf{y} - \mathbf{Xw} \right)^\top \left( \mathbf{y} - \mathbf{Xw} \right)$ is $\frac{2\lambda}{n} + \frac{2\lambda^2}{n} + \frac{1}{n} \left( 1 + \lambda \right)\sqrt{\frac{4\lambda^2/n}{1/

Figures (4)

  • Figure 1: Comparing RNM-Screen to ADP-Screen. From these graphs, it is clear that RNM-Screen is able to distinguish between screening true nonzero coefficients and true zero coefficients, while ADP-Screen is generally unable to do this. However, in both cases, RNM-Screen does not achieve the same final sparsity level as ADP-Screen. Additionally, it is interesting to note that correlated features improves the performance of private screening rules.
  • Figure 2: Mean squared error loss of RNM-Screen on real-world datasets. Decreasing loss on datasets with larger $d$ indicate that learning does occur in these instances. Since we are most interested in employing sparsity to build a generalizable and interpretable model on datasets with large $d$, this indicates that RNM-Screen may be useful for this task. Note that DP-FW does not produce sparse solutions, which is the purpose of this work. For this reason, it is excluded from this plot.
  • Figure 3: Testing ADP-Screen on a synthetic dataset. Values reported are the fraction of nonzero coefficients in the output of ADP-Screen which correspond to the indices of the true nonzero and true zero coefficients. The true nonzero and true zero coefficients are known from the dataset generation procedure.
  • Figure 4: Comparing the Wolfe gap function for nonprivate and private optimization when nonprivate screening is applied at every iteration.

Theorems & Definitions (6)

  • theorem 1
  • lemma 1
  • theorem 2
  • theorem 1
  • lemma 2
  • theorem 4