Differentially Private Iterative Screening Rules for Linear Regression
Amol Khanna, Fred Lu, Edward Raff
TL;DR
This work addresses sparse, linear regression under differential privacy by introducing a differentially private screening rule. After showing that an aggressive initial approach overscreens, the authors develop RNM-Screen, which combines privacy-budget redistribution and a per-iteration report-noisy-min selection to privately screen one coefficient at a time, achieving more controlled sparsity. Empirical results on synthetic and real-world data demonstrate that RNM-Screen reduces overscreening relative to ADP-Screen and can yield lower mean-squared error on larger datasets, with informative but dataset-dependent improvements in feature selection metrics. The study advances private sparse optimization by showing that screening-based methods can produce sparse, private models, and it outlines practical open problems such as utility bounds, budget-splitting guidelines, and extensions to broader privacy settings.
Abstract
Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data science. Over the past decade, screening rules have risen in popularity as a way to eliminate features when producing the sparse regression weights of $L_1$ models. However, despite the increasing need of privacy-preserving models for data analysis, to the best of our knowledge, no differentially private screening rule exists. In this paper, we develop the first private screening rule for linear regression. We initially find that this screening rule is too strong: it screens too many coefficients as a result of the private screening step. However, a weakened implementation of private screening reduces overscreening and improves performance.
