Table of Contents
Fetching ...

Faster Algorithms for User-Level Private Stochastic Convex Optimization

Andrew Lowy, Daogao Liu, Hilal Asi

TL;DR

A linear-time algorithm with state-of-the-art excess risk (for a non-trivial linear-time algorithm) under a mild smoothness assumption is developed and achieves optimal excess risk in $\approx (mn)^{9/8}$ gradient computations.

Abstract

We study private stochastic convex optimization (SCO) under user-level differential privacy (DP) constraints. In this setting, there are $n$ users (e.g., cell phones), each possessing $m$ data items (e.g., text messages), and we need to protect the privacy of each user's entire collection of data items. Existing algorithms for user-level DP SCO are impractical in many large-scale machine learning scenarios because: (i) they make restrictive assumptions on the smoothness parameter of the loss function and require the number of users to grow polynomially with the dimension of the parameter space; or (ii) they are prohibitively slow, requiring at least $(mn)^{3/2}$ gradient computations for smooth losses and $(mn)^3$ computations for non-smooth losses. To address these limitations, we provide novel user-level DP algorithms with state-of-the-art excess risk and runtime guarantees, without stringent assumptions. First, we develop a linear-time algorithm with state-of-the-art excess risk (for a non-trivial linear-time algorithm) under a mild smoothness assumption. Our second algorithm applies to arbitrary smooth losses and achieves optimal excess risk in $\approx (mn)^{9/8}$ gradient computations. Third, for non-smooth loss functions, we obtain optimal excess risk in $n^{11/8} m^{5/4}$ gradient computations. Moreover, our algorithms do not require the number of users to grow polynomially with the dimension.

Faster Algorithms for User-Level Private Stochastic Convex Optimization

TL;DR

A linear-time algorithm with state-of-the-art excess risk (for a non-trivial linear-time algorithm) under a mild smoothness assumption is developed and achieves optimal excess risk in gradient computations.

Abstract

We study private stochastic convex optimization (SCO) under user-level differential privacy (DP) constraints. In this setting, there are users (e.g., cell phones), each possessing data items (e.g., text messages), and we need to protect the privacy of each user's entire collection of data items. Existing algorithms for user-level DP SCO are impractical in many large-scale machine learning scenarios because: (i) they make restrictive assumptions on the smoothness parameter of the loss function and require the number of users to grow polynomially with the dimension of the parameter space; or (ii) they are prohibitively slow, requiring at least gradient computations for smooth losses and computations for non-smooth losses. To address these limitations, we provide novel user-level DP algorithms with state-of-the-art excess risk and runtime guarantees, without stringent assumptions. First, we develop a linear-time algorithm with state-of-the-art excess risk (for a non-trivial linear-time algorithm) under a mild smoothness assumption. Our second algorithm applies to arbitrary smooth losses and achieves optimal excess risk in gradient computations. Third, for non-smooth loss functions, we obtain optimal excess risk in gradient computations. Moreover, our algorithms do not require the number of users to grow polynomially with the dimension.

Paper Structure

This paper contains 28 sections, 25 theorems, 68 equations, 5 figures.

Key Result

Theorem 2.1

Let $\varepsilon \leq 10$, $n^{1 - o(1)} \gtrsim \frac{\log(n/\delta)}{\varepsilon}$, $\beta \leq (L/R) \sqrt{d m n \varepsilon}$, and $m \lesssim \text{poly}(n)$. Then, Algorithm alg: phased ERM without regularization is $(\varepsilon, \delta)$-user-level DP. Further, The gradient complexity of Algorithm alg: phased ERM without regularization is $\leq nm$.

Figures (5)

  • Figure 1: Optimal algorithms for user-level DP SCO. We omit logarithms, fix $L = R = 1 = \varepsilon$ and $n = d$.
  • Figure 2: User-Level DP Phased SGD with Outlier Iterate Removal and Output Perturbation
  • Figure 3: $\texttt{User-Level DP Accelerated Minibatch SGD}(\widehat{F}_{i}, T_i, K_i, x_{i-1}, \tau, \varepsilon, \delta)$
  • Figure 4: User-Level DP Accelerated Phased ERM with Outlier Gradient Removal
  • Figure 5: $\text{AboveThreshold}$

Theorems & Definitions (44)

  • Definition 1.3: User-Level Differential Privacy
  • Theorem 2.1: Privacy and utility of Algorithm \ref{['alg: phased ERM without regularization']} - Informal
  • Remark 2.2: State-of-the-art excess risk in linear time, without the restrictive assumptions
  • Lemma 2.3
  • Theorem 3.1: Privacy of Algorithm \ref{['alg: accelerated phased ERM']}
  • Theorem 3.2: Utility & runtime of Algorithm \ref{['alg: accelerated phased ERM']} - Informal
  • Remark 3.3: State-of-the-art runtime
  • Remark 3.4: Mild assumptions
  • Lemma 3.5: Variance Bound for Algorithm \ref{['alg: accelerated minibatch']}
  • Remark 3.6: Strongly convex losses: Optimal excess risk with state-of-the-art runtime
  • ...and 34 more