SoK: A Review of Differentially Private Linear Models For High-Dimensional Data

Amol Khanna; Edward Raff; Nathan Inkawhich

SoK: A Review of Differentially Private Linear Models For High-Dimensional Data

Amol Khanna, Edward Raff, Nathan Inkawhich

TL;DR

The paper addresses the challenge of training differential privacy (DP) linear models in high-dimensional settings where $n<d$, where overfitting and privacy leakage are prominent. It surveys and categorizes optimization methods (Model Selection, Frank–Wolfe, Compressed Learning, ADMM, Thresholding, Coordinate Descent, Mirror Descent), and provides a systematic empirical comparison across six datasets for linear and logistic regression under various DP budgets, with code released for reproducibility. A key finding is that methods accounting for per-feature scale and using robust or coordinate-wise updates often outperform Lipschitz-based approaches, but computational cost and ambiguous regularization effects remain major hurdles. The work offers practical guidance for future DP high-dimensional modeling and establishes a benchmark framework to evaluate new methods, enabling more rapid progress in privacy-preserving high-dimensional statistics.

Abstract

Linear models are ubiquitous in data science, but are particularly prone to overfitting and data memorization in high dimensions. To guarantee the privacy of training data, differential privacy can be used. Many papers have proposed optimization techniques for high-dimensional differentially private linear models, but a systematic comparison between these methods does not exist. We close this gap by providing a comprehensive review of optimization methods for private high-dimensional linear models. Empirical tests on all methods demonstrate robust and coordinate-optimized algorithms perform best, which can inform future research. Code for implementing all methods is released online.

SoK: A Review of Differentially Private Linear Models For High-Dimensional Data

TL;DR

The paper addresses the challenge of training differential privacy (DP) linear models in high-dimensional settings where

, where overfitting and privacy leakage are prominent. It surveys and categorizes optimization methods (Model Selection, Frank–Wolfe, Compressed Learning, ADMM, Thresholding, Coordinate Descent, Mirror Descent), and provides a systematic empirical comparison across six datasets for linear and logistic regression under various DP budgets, with code released for reproducibility. A key finding is that methods accounting for per-feature scale and using robust or coordinate-wise updates often outperform Lipschitz-based approaches, but computational cost and ambiguous regularization effects remain major hurdles. The work offers practical guidance for future DP high-dimensional modeling and establishes a benchmark framework to evaluate new methods, enabling more rapid progress in privacy-preserving high-dimensional statistics.

Abstract

Paper Structure (23 sections, 8 equations, 7 figures, 9 tables)

This paper contains 23 sections, 8 equations, 7 figures, 9 tables.

Introduction
Preliminaries
Differential Privacy
Global and Local Differential Privacy
Sparsity, Stability, and Differential Privacy
High-Dimensional Optimization for Linear Models
Review
Model Selection
Frank-Wolfe
Compressed Learning
ADMM
Thresholding
Coordinate Descent
Mirror Descent
Implementation Details
...and 8 more sections

Figures (7)

Figure 1: A taxonomy of optimization techniques used for high-dimensional DP linear models.
Figure 2: Bodyfat: Mean Squared Error
Figure 3: PAH: Mean Squared Error
Figure 4: E2006: Mean Squared Error
Figure 5: Heart: Accuracy
...and 2 more figures

SoK: A Review of Differentially Private Linear Models For High-Dimensional Data

TL;DR

Abstract

SoK: A Review of Differentially Private Linear Models For High-Dimensional Data

Authors

TL;DR

Abstract

Table of Contents

Figures (7)