Table of Contents
Fetching ...

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Puning Zhao, Jiafei Wu, Zhe Liu, Chong Wang, Rongfei Fan, Qingming Li

TL;DR

The paper addresses stochastic convex optimization under differential privacy with heavy-tailed gradient noise, showing that prior gradient estimators introduce suboptimal tail behavior and an inflated dimensional factor. It proposes two methods—simple clipping and iterative updating—and derives high-probability, tail-aware risk bounds. The simple clipping method achieves near-minimax rates for small privacy budgets but incurs an extra term in general, while the iterative updating method attains the minimax rate $\tilde{O}\left(\sqrt{d/n} + \sqrt{d}\left(\frac{\sqrt{d}}{n\epsilon}\right)^{1-1/p}\right)$ for all $\epsilon$, matching the Kamath et al. lower bound up to logarithms. The core innovations include refined DP mean estimation under heavy tails, and privacy amplification via shuffling, enabling practical, theory-backed DP stochastic optimization in realistic heavy-tailed settings.

Abstract

We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of $d$ in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded $p$-th order moments of gradients, with $n$ samples, it achieves $\tilde{O}(\sqrt{d/n}+\sqrt{d}(\sqrt{d}/nε)^{1-1/p})$ population risk with $ε\leq 1/\sqrt{d}$. We then propose an iterative updating method, which is more complex but achieves this rate for all $ε\leq 1$. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

TL;DR

The paper addresses stochastic convex optimization under differential privacy with heavy-tailed gradient noise, showing that prior gradient estimators introduce suboptimal tail behavior and an inflated dimensional factor. It proposes two methods—simple clipping and iterative updating—and derives high-probability, tail-aware risk bounds. The simple clipping method achieves near-minimax rates for small privacy budgets but incurs an extra term in general, while the iterative updating method attains the minimax rate for all , matching the Kamath et al. lower bound up to logarithms. The core innovations include refined DP mean estimation under heavy tails, and privacy amplification via shuffling, enabling practical, theory-backed DP stochastic optimization in realistic heavy-tailed settings.

Abstract

We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded -th order moments of gradients, with samples, it achieves population risk with . We then propose an iterative updating method, which is more complex but achieves this rate for all . The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.
Paper Structure (20 sections, 20 theorems, 116 equations, 1 table, 3 algorithms)

This paper contains 20 sections, 20 theorems, 116 equations, 1 table, 3 algorithms.

Key Result

Lemma 1

There are several facts about DP and CDP: (1) (Advanced composition, dwork2010boostingdwork2014algorithmic) If $\mathcal{A}_1,\ldots, \mathcal{A}_k$ are $(\epsilon, \delta)$-DP, then the composition $(\mathcal{A}_1,\ldots, \mathcal{A}_k)$ is $(\sqrt{2k\ln(1/\delta')}\epsilon+k\epsilon(e^\epsilon-1),

Theorems & Definitions (26)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Lemma 4
  • proof
  • Theorem 2
  • Theorem 3
  • ...and 16 more