Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Puning Zhao; Jiafei Wu; Zhe Liu; Chong Wang; Rongfei Fan; Qingming Li

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Puning Zhao, Jiafei Wu, Zhe Liu, Chong Wang, Rongfei Fan, Qingming Li

TL;DR

The paper addresses stochastic convex optimization under differential privacy with heavy-tailed gradient noise, showing that prior gradient estimators introduce suboptimal tail behavior and an inflated dimensional factor. It proposes two methods—simple clipping and iterative updating—and derives high-probability, tail-aware risk bounds. The simple clipping method achieves near-minimax rates for small privacy budgets but incurs an extra term in general, while the iterative updating method attains the minimax rate $\tilde{O}\left(\sqrt{d/n} + \sqrt{d}\left(\frac{\sqrt{d}}{n\epsilon}\right)^{1-1/p}\right)$ for all $\epsilon$, matching the Kamath et al. lower bound up to logarithms. The core innovations include refined DP mean estimation under heavy tails, and privacy amplification via shuffling, enabling practical, theory-backed DP stochastic optimization in realistic heavy-tailed settings.

Abstract

We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of $d$ in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded $p$-th order moments of gradients, with $n$ samples, it achieves $\tilde{O}(\sqrt{d/n}+\sqrt{d}(\sqrt{d}/nε)^{1-1/p})$ population risk with $ε\leq 1/\sqrt{d}$. We then propose an iterative updating method, which is more complex but achieves this rate for all $ε\leq 1$. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

TL;DR

for all

, matching the Kamath et al. lower bound up to logarithms. The core innovations include refined DP mean estimation under heavy tails, and privacy amplification via shuffling, enabling practical, theory-backed DP stochastic optimization in realistic heavy-tailed settings.

Abstract

in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded

-th order moments of gradients, with

samples, it achieves

population risk with

. We then propose an iterative updating method, which is more complex but achieves this rate for all

. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.

Paper Structure (20 sections, 20 theorems, 116 equations, 1 table, 3 algorithms)

This paper contains 20 sections, 20 theorems, 116 equations, 1 table, 3 algorithms.

Introduction
Related Work
Preliminaries
Simple Clipping Method
Mean Estimation
Optimization
Iterative Updating Method
Intuition
The Mean Estimation Algorithm
Application in DP Optimization
Conclusion
Proof of Lemma \ref{['lem:opt']}
Proof of Theorem \ref{['thm:meanest']}
Proof of Lemma \ref{['lem:bg']}
Proof of Theorem \ref{['thm:risk']}
...and 5 more sections

Key Result

Lemma 1

There are several facts about DP and CDP: (1) (Advanced composition, dwork2010boostingdwork2014algorithmic) If $\mathcal{A}_1,\ldots, \mathcal{A}_k$ are $(\epsilon, \delta)$-DP, then the composition $(\mathcal{A}_1,\ldots, \mathcal{A}_k)$ is $(\sqrt{2k\ln(1/\delta')}\epsilon+k\epsilon(e^\epsilon-1),

Theorems & Definitions (26)

Definition 1
Definition 2
Lemma 1
Lemma 2
Theorem 1
Lemma 3
Lemma 4
proof
Theorem 2
Theorem 3
...and 16 more

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

TL;DR

Abstract

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (26)