Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Chenhan Jin; Kaiwen Zhou; Bo Han; James Cheng; Tieyong Zeng

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Chenhan Jin, Kaiwen Zhou, Bo Han, James Cheng, Tieyong Zeng

TL;DR

This work tackles differential private stochastic convex optimization under heavy-tailed data by introducing a one-time gradient clipping scheme, AClip, and a private mean estimator that augments clipping with Gaussian noise. The proposed method, AClipped-dpSGD, achieves high-probability excess-risk bounds for both constrained and unconstrained convex problems, and extends to strongly convex and Hölder-continuous/non-smooth objectives, all while reducing gradient complexity compared to DP-SGD/DP-GD. The authors provide principled tuning guidelines for clipping and batch size, and validate the theory with extensive experiments on synthetic and real datasets, including CIFAR-10 with ResNet-18. The results demonstrate faster running times and improved privacy-utility trade-offs for heavy-tailed data, making the approach practical for large-scale private learning tasks. Overall, the paper offers a new perspective on clipping strategies in private optimization and delivers concrete, high-probability guarantees across several important problem classes.

Abstract

We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Most prior works on differentially private stochastic convex optimization for heavy-tailed data are either restricted to gradient descent (GD) or performed multi-times clipping on stochastic gradient descent (SGD), which is inefficient for large-scale problems. In this paper, we consider a one-time clipping strategy and provide principled analyses of its bias and private mean estimation. We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems. We also extend our convergent analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

TL;DR

Abstract

lder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.

Paper Structure (20 sections, 8 theorems, 136 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 20 sections, 8 theorems, 136 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Preliminaries
Settings
Differential privacy
Mean estimation oracle with clipping
Bias of clipped methods
Private mean estimator
Convergence of algorithms for DP-SCO
Convex case
Strongly convex case
Non-smooth case
Experiments
Conclusions
Missing proofs
Proof of Theorem \ref{['cccc']}
...and 5 more sections

Key Result

Lemma 1

In the setting of Assumption assump1, let $b(\nabla f(x,\bm{\xi}),\lambda)$ denotes the non-average of AClip, i.e., If $\|\nabla f(x)\|_2 \leq \lambda / 2$ holds, then

Figures (7)

Figure 1: Trajectories of the logistic regression model for the real-world data. The top and bottom rows correspond to the Diabetes and Adult datasets, respectively.
Figure 2: Trajectories of test accuracy for different noise levels on CIFAR-10 dataset. We set the $\delta$ to $10^{-5},$ clipping level $\lambda$ to 30, and batch size $m$ to 500. All experiments are conducted on the NVIDIA RTX A6000 platform.
Figure 3: Trajectories of the logistic regression model for the synthetic data. The three rows correspond to the Chi-squared distribution, Laplace distribution, and Student's t-distribution, respectively.
Figure 4: Trajectories of the ridge regression model for the real-world data. The top and bottom rows correspond to the Diabetes and Adult datasets, respectively.
Figure 5: Trajectories of the ridge regression model for the synthetic data. The three rows correspond to the Chi-squared distribution, Laplace distribution, and Student's t-distribution, respectively.
...and 2 more figures

Theorems & Definitions (32)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Lemma 1
proof
Theorem 1
proof
...and 22 more

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

TL;DR

Abstract

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (32)