Table of Contents
Fetching ...

Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions

Hilal Asi, Daogao Liu, Kevin Tian

TL;DR

This work addresses DP-SCO under heavy-tailed gradient distributions by replacing uniform Lipschitz assumptions with $k$-th moment bounds and introducing a reduction-based, localization-driven framework. The authors develop a DP-ERM solver based on clipped-DP-SGD, establish population-level localization, and compose these to obtain near-optimal private-sCO rates that match known lower bounds up to polylog factors, with enhanced results under known Lipschitz constants and for smooth objective classes. They also present fast, linear-time algorithms for smooth settings, employing stability analyses and the sparse vector technique to ensure privacy. A key specialization to generalized linear models yields optimal rates with linear gradient-query complexity, highlighting practical efficiency and broad applicability. Overall, the paper advances private optimization under heavy tails by unifying reductions, localization, and efficient private-SGD-like methods to achieve near-optimal guarantees.

Abstract

We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $G_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{nε})^{1 - \frac 1 k}$ under $(ε, δ)$-approximate differential privacy, up to a mild $\textup{polylog}(\frac{1}δ)$ factor, where $G_2^2$ and $G_k^k$ are the $2^{\text{nd}}$ and $k^{\text{th}}$ moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023]. We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.

Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions

TL;DR

This work addresses DP-SCO under heavy-tailed gradient distributions by replacing uniform Lipschitz assumptions with -th moment bounds and introducing a reduction-based, localization-driven framework. The authors develop a DP-ERM solver based on clipped-DP-SGD, establish population-level localization, and compose these to obtain near-optimal private-sCO rates that match known lower bounds up to polylog factors, with enhanced results under known Lipschitz constants and for smooth objective classes. They also present fast, linear-time algorithms for smooth settings, employing stability analyses and the sparse vector technique to ensure privacy. A key specialization to generalized linear models yields optimal rates with linear gradient-query complexity, highlighting practical efficiency and broad applicability. Overall, the paper advances private optimization under heavy tails by unifying reductions, localization, and efficient private-SGD-like methods to achieve near-optimal guarantees.

Abstract

We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a -moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error under -approximate differential privacy, up to a mild factor, where and are the and moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023]. We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.
Paper Structure (27 sections, 31 theorems, 126 equations)

This paper contains 27 sections, 31 theorems, 126 equations.

Key Result

Lemma 1

RDP has the following properties.

Theorems & Definitions (69)

  • Definition 1: Differential privacy
  • Definition 2: Rényi DP
  • Definition 3: CDP
  • Lemma 1: Mironov17
  • Definition 4: $k$-heavy-tailed private SCO
  • Lemma 2
  • proof
  • Proposition 1
  • proof
  • Lemma 3
  • ...and 59 more