Table of Contents
Fetching ...

On the Computational Complexity of Private High-dimensional Model Selection

Saptarshi Roy, Zehua Wang, Ambuj Tewari

TL;DR

A differentially private best subset selection method with strong statistical utility properties is proposed by adopting the well-known exponential mechanism for selecting the best model in a high-dimensional sparse linear regression model under privacy constraints.

Abstract

We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private (DP) best subset selection method with strong statistical utility properties by adopting the well-known exponential mechanism for selecting the best model. To achieve computational expediency, we propose an efficient Metropolis-Hastings algorithm and under certain regularity conditions, we establish that it enjoys polynomial mixing time to its stationary distribution. As a result, we also establish both approximate differential privacy and statistical utility for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments on simulated data showing that our algorithm can quickly identify active features under reasonable privacy budget constraints.

On the Computational Complexity of Private High-dimensional Model Selection

TL;DR

A differentially private best subset selection method with strong statistical utility properties is proposed by adopting the well-known exponential mechanism for selecting the best model in a high-dimensional sparse linear regression model under privacy constraints.

Abstract

We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private (DP) best subset selection method with strong statistical utility properties by adopting the well-known exponential mechanism for selecting the best model. To achieve computational expediency, we propose an efficient Metropolis-Hastings algorithm and under certain regularity conditions, we establish that it enjoys polynomial mixing time to its stationary distribution. As a result, we also establish both approximate differential privacy and statistical utility for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments on simulated data showing that our algorithm can quickly identify active features under reasonable privacy budget constraints.
Paper Structure (37 sections, 12 theorems, 72 equations, 4 figures, 2 tables)

This paper contains 37 sections, 12 theorems, 72 equations, 4 figures, 2 tables.

Key Result

Lemma 1

Exponential mechanism $\mathcal{A}_E(D)$ that outputs samples from the probability distribution preserves $(2\varepsilon, 0)$-differential privacy. If $u(\cdot, \cdot)$ is data monotone, then we have $(\varepsilon, 0)$-differential privacy.

Figures (4)

  • Figure 1: Metropolis-Hastings random walk under different privacy budgets and $\ell_1$ regularization. (Strong signal)
  • Figure 2: Metropolis-Hastings random walk under different privacy budgets and $\ell_1$ regularization. (Weak signal)
  • Figure 3: Gaussian setting Metropolis-Hastings random walk under different privacy budgets and $\ell_1$ regularization. (Strong signal)
  • Figure 4: Gaussian setting Metropolis-Hastings random walk under different privacy budgets and $\ell_1$ regularization. (Weak signal)

Theorems & Definitions (19)

  • Definition 1: $(\varepsilon, \delta)$-DP, dwork2006differential
  • Lemma 1: durfee2019practicalmcsherry2007mechanism
  • Lemma 2
  • Lemma 3: Sensitivity bound and DP
  • Theorem 1: Utility gurantee
  • Remark 1
  • Remark 2
  • Lemma 4
  • Theorem 2: Rapid mixing time
  • Corollary 1
  • ...and 9 more