Better Locally Private Sparse Estimation Given Multiple Samples Per User

Yuheng Ma; Ke Jia; Hanfang Yang

Better Locally Private Sparse Estimation Given Multiple Samples Per User

Yuheng Ma, Ke Jia, Hanfang Yang

TL;DR

The paper analyzes sparse linear regression under user-level local differential privacy (ULDP) and shows that multiple samples per user enable error rates that can eliminate the ambient dimension $d$ from the rate, achieving $\mathcal{O}(s^{*2}/(nm\varepsilon^2))$ versus the LDP rate $\mathcal{O}(d s^{*}/(nm\varepsilon^2))$. It introduces a two-stage framework: first, private candidate-variable selection via heavy hitters to identify a support of size $s^*$, then estimation in the reduced $s$-dimensional space. Two estimation protocols are developed—a multi-round ULDP stochastic convex optimization method and a faster two-round protocol using ULDP mean estimation—yielding tight upper bounds that scale with $s^*$ and $m,n$ but with limited dependence on $d$. Theoretical lower bounds show ULDP can outperform LDP in sparse settings, and experiments on synthetic and real data validate the advantage of leveraging multiple local samples per user under ULDP. Overall, the work provides a general, scalable ULDP framework for sparse estimation with strong theoretical guarantees and practical efficiency.

Abstract

Previous studies yielded discouraging results for item-level locally differentially private linear regression with $s^*$-sparsity assumption, where the minimax rate for $nm$ samples is $\mathcal{O}(s^{*}d / nm\varepsilon^2)$. This can be challenging for high-dimensional data, where the dimension $d$ is extremely large. In this work, we investigate user-level locally differentially private sparse linear regression. We show that with $n$ users each contributing $m$ samples, the linear dependency of dimension $d$ can be eliminated, yielding an error upper bound of $\mathcal{O}(s^{*2} / nm\varepsilon^2)$. We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space, which is extendable to general sparse estimation problems with tight error bounds. Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods. Both the theoretical and empirical results suggest that, with the same number of samples, locally private sparse estimation is better conducted when multiple samples per user are available.

Better Locally Private Sparse Estimation Given Multiple Samples Per User

TL;DR

The paper analyzes sparse linear regression under user-level local differential privacy (ULDP) and shows that multiple samples per user enable error rates that can eliminate the ambient dimension

from the rate, achieving

versus the LDP rate

. It introduces a two-stage framework: first, private candidate-variable selection via heavy hitters to identify a support of size

, then estimation in the reduced

-dimensional space. Two estimation protocols are developed—a multi-round ULDP stochastic convex optimization method and a faster two-round protocol using ULDP mean estimation—yielding tight upper bounds that scale with

and

but with limited dependence on

. Theoretical lower bounds show ULDP can outperform LDP in sparse settings, and experiments on synthetic and real data validate the advantage of leveraging multiple local samples per user under ULDP. Overall, the work provides a general, scalable ULDP framework for sparse estimation with strong theoretical guarantees and practical efficiency.

Abstract

Previous studies yielded discouraging results for item-level locally differentially private linear regression with

-sparsity assumption, where the minimax rate for

samples is

. This can be challenging for high-dimensional data, where the dimension

is extremely large. In this work, we investigate user-level locally differentially private sparse linear regression. We show that with

users each contributing

samples, the linear dependency of dimension

can be eliminated, yielding an error upper bound of

. We propose a framework that first selects candidate variables and then conducts estimation in the narrowed low-dimensional space, which is extendable to general sparse estimation problems with tight error bounds. Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods. Both the theoretical and empirical results suggest that, with the same number of samples, locally private sparse estimation is better conducted when multiple samples per user are available.

Paper Structure (38 sections, 18 theorems, 91 equations, 6 figures, 4 tables, 9 algorithms)

This paper contains 38 sections, 18 theorems, 91 equations, 6 figures, 4 tables, 9 algorithms.

Introduction
ULDP Sparse Linear Regression
Preliminaries
Related Work
Minimax Lower Bound
An Algorithm
Candidate Variable Selection
Coefficient Estimation
A Multi-round Protocol via SCO
A Two Round Protocol
Extension to Sparse Estimation
Experiment Results
Simulation
Real Data
Discussion
...and 23 more sections

Key Result

Proposition 2.2

Let $\mathcal{H}$ be distribution class satisfying equ:modelassumption for $0\leq a \leq 1$. Let data $\{(X_i, y_i)\}_{i=1 }^n$ be generated from equ:modelassumption with $n = n'm'$ and $m = 1$. For $0<\varepsilon \leq 1$, let $\beta_{\varepsilon}$ be any $\varepsilon$-LDP estimator of $\beta^*$. Th

Figures (6)

Figure 1: Illustration of the proposed sparse estimation framework.
Figure 2: Experiments w.r.t. $d$ and $\varepsilon$. We plot the quantiles over 30 repetitions with $95\%$ coverage. We exclude LDPPROX in the last three figures since it is highly unstable and do not fit into our plot scale.
Figure 3: Experiments w.r.t. $m$ and $\ell_2$ error.
Figure 4: Experiments w.r.t. $n$ and $n / m$.
Figure 5: Experiments w.r.t. $d$ and $\varepsilon$ for correlated marginal. We plot the quantiles over 30 repetitions with $95\%$ coverage. We exclude LDPPROX in some figures since it is highly unstable and do not fit into our plot scale.
...and 1 more figures

Theorems & Definitions (37)

Definition 2.1: User-level local differential privacy
Proposition 2.2: LDP lower bound
Proposition 2.3: Necessity of sufficiently large $\mathbf{m}$
Theorem 2.4: ULDP lower bound
Definition 3.1: $\mathbf{\alpha}$-Good selector
Proposition 3.2: Existence of good selectors
Proposition 3.3
Theorem 3.4: Informal
Proposition 3.5
Theorem 3.6
...and 27 more

Better Locally Private Sparse Estimation Given Multiple Samples Per User

TL;DR

Abstract

Better Locally Private Sparse Estimation Given Multiple Samples Per User

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (37)