Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

Weidong Liu; Xiaojun Mao; Xiaofei Zhang; Xin Zhang

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

Weidong Liu, Xiaojun Mao, Xiaofei Zhang, Xin Zhang

TL;DR

This work tackles privacy-preserving sparse regression under robust loss by focusing on least absolute deviation (LAD) with an $\ell_1$ penalty. It introduces FRAPPE, a fast algorithm that transforms the non-smooth LAD problem into a surrogate least-squares problem via a pseudo-response, and secures $(\epsilon,\delta)$-DP through a three-stage noise injection across initialization, kernel-density estimation, and gradient perturbation. Theoretical results establish DP guarantees and near-oracle statistical accuracy, showing a privacy-accuracy trade-off that scales with $O\left(\sqrt{p \log(1/\delta) \log(N\epsilon)} /(N\epsilon)\right)$ plus the classical $O\left(\sqrt{s \log p / N}\right)$ rate. Empirical evaluations on synthetic and real data demonstrate that FRAPPE outperforms existing private sparse regression methods, especially under heavy-tailed noise, while maintaining computational efficiency.

Abstract

In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss). In this work, we aim to develop a fast privacy-preserving learning solution for a sparse robust regression problem. Our learning loss consists of a robust least absolute loss and an $\ell_1$ sparse penalty term. To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression. Our algorithm achieves a fast estimation by reformulating the sparse LAD problem as a penalized least square estimation problem and adopts a three-stage noise injection to guarantee the $(ε,δ)$-differential privacy. We show that our algorithm can achieve better privacy and statistical accuracy trade-off compared with the state-of-the-art privacy-preserving regression algorithms. In the end, we conduct experiments to verify the efficiency of our proposed FRAPPE algorithm.

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

TL;DR

This work tackles privacy-preserving sparse regression under robust loss by focusing on least absolute deviation (LAD) with an

penalty. It introduces FRAPPE, a fast algorithm that transforms the non-smooth LAD problem into a surrogate least-squares problem via a pseudo-response, and secures

-DP through a three-stage noise injection across initialization, kernel-density estimation, and gradient perturbation. Theoretical results establish DP guarantees and near-oracle statistical accuracy, showing a privacy-accuracy trade-off that scales with

plus the classical

rate. Empirical evaluations on synthetic and real data demonstrate that FRAPPE outperforms existing private sparse regression methods, especially under heavy-tailed noise, while maintaining computational efficiency.

Abstract

sparse penalty term. To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression. Our algorithm achieves a fast estimation by reformulating the sparse LAD problem as a penalized least square estimation problem and adopts a three-stage noise injection to guarantee the

-differential privacy. We show that our algorithm can achieve better privacy and statistical accuracy trade-off compared with the state-of-the-art privacy-preserving regression algorithms. In the end, we conduct experiments to verify the efficiency of our proposed FRAPPE algorithm.

Paper Structure (14 sections, 3 theorems, 18 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 14 sections, 3 theorems, 18 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Preliminaries and Related Works
Robust Linear Regression
Differential Private Regression
Problem and Algorithm
Problem Formulation
Proposed Algorithm
Main Theoretical Results
Privacy Guarantee
Statistical Accuracy
Numerical Evaluation
Synthetic Data Experiments
Real Data Analysis
Conclusion

Key Result

Theorem 1

Suppose Assumptions assum:bounded_x-assum:kernel_bound hold, then Algorithm Algorithm: GT-QRE is $(\epsilon, \delta)$-DP with $\sigma^2_{\widehat{\boldsymbol{\beta}}_0} = \frac{24c_{\mathbf{x}}^2 \log(n/(N\delta))}{\epsilon^2 \lambda_{02}^2 N^2}$, $\sigma^2_{\widehat{f}_{v}} = \frac{24B^2\log(1/\del

Figures (5)

Figure 1: The probability densities for the three noise distributions. $\mathrm{Cauchy}$ distribution has a heaviest tail and N(0,1) has the lightest tail.
Figure 2: The MSE vs the sparsity level $s$ ranging from 1 to 30. The three figures are corresponding to different noise distributions under sample size $N=5000$, dimension $p=100$, and privacy budget $\epsilon=0.5$. Compared with the existing algorithms, our method achieved the best performance.
Figure 3: The MSE vs the privacy budget $\epsilon$ ranging from 0 to 1. The three figures are corresponding to different noise distributions under sample size $N=5000$, dimension $p=100$, and sparsity level $s=10$. Compared with the existing algorithms, our method achieved the best performance.
Figure 4: The MSE vs computation time. We report the estimation MSE as computation time increases. The two figures correspond to different sample sizes $N=2000$ and $N=5000$ under $\mathrm{Cauchy}$ noise, sparsity level $s=10$, dimension $p=100$, and privacy budget $\epsilon=0.5$. Compared with the SgpLAD algorithm, our method performed better.
Figure 5: The boxplots for the normalized responses in the training data for Ames Housing and Communities and Crime dataset.

Theorems & Definitions (10)

Definition 1: $(\epsilon, \delta)$-Differential Privacy
Definition 2: $\ell_2$-Sensitivity
Remark 1
Remark 2
Theorem 1
Theorem 2
Remark 3
Theorem 3
Remark 4
Remark 5

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

TL;DR

Abstract

Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (10)