Table of Contents
Fetching ...

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

Chenhan Jin, Kaiwen Zhou, Bo Han, James Cheng, Tieyong Zeng

TL;DR

This work tackles differential private stochastic convex optimization under heavy-tailed data by introducing a one-time gradient clipping scheme, AClip, and a private mean estimator that augments clipping with Gaussian noise. The proposed method, AClipped-dpSGD, achieves high-probability excess-risk bounds for both constrained and unconstrained convex problems, and extends to strongly convex and Hölder-continuous/non-smooth objectives, all while reducing gradient complexity compared to DP-SGD/DP-GD. The authors provide principled tuning guidelines for clipping and batch size, and validate the theory with extensive experiments on synthetic and real datasets, including CIFAR-10 with ResNet-18. The results demonstrate faster running times and improved privacy-utility trade-offs for heavy-tailed data, making the approach practical for large-scale private learning tasks. Overall, the paper offers a new perspective on clipping strategies in private optimization and delivers concrete, high-probability guarantees across several important problem classes.

Abstract

We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Most prior works on differentially private stochastic convex optimization for heavy-tailed data are either restricted to gradient descent (GD) or performed multi-times clipping on stochastic gradient descent (SGD), which is inefficient for large-scale problems. In this paper, we consider a one-time clipping strategy and provide principled analyses of its bias and private mean estimation. We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems. We also extend our convergent analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with H$\ddot{\text{o}}$lder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.

Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping

TL;DR

This work tackles differential private stochastic convex optimization under heavy-tailed data by introducing a one-time gradient clipping scheme, AClip, and a private mean estimator that augments clipping with Gaussian noise. The proposed method, AClipped-dpSGD, achieves high-probability excess-risk bounds for both constrained and unconstrained convex problems, and extends to strongly convex and Hölder-continuous/non-smooth objectives, all while reducing gradient complexity compared to DP-SGD/DP-GD. The authors provide principled tuning guidelines for clipping and batch size, and validate the theory with extensive experiments on synthetic and real datasets, including CIFAR-10 with ResNet-18. The results demonstrate faster running times and improved privacy-utility trade-offs for heavy-tailed data, making the approach practical for large-scale private learning tasks. Overall, the paper offers a new perspective on clipping strategies in private optimization and delivers concrete, high-probability guarantees across several important problem classes.

Abstract

We consider stochastic convex optimization for heavy-tailed data with the guarantee of being differentially private (DP). Most prior works on differentially private stochastic convex optimization for heavy-tailed data are either restricted to gradient descent (GD) or performed multi-times clipping on stochastic gradient descent (SGD), which is inefficient for large-scale problems. In this paper, we consider a one-time clipping strategy and provide principled analyses of its bias and private mean estimation. We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems. We also extend our convergent analysis to the strongly convex case and non-smooth case (which works for generalized smooth objectives with Hlder-continuous gradients). All the above results are guaranteed with a high probability for heavy-tailed data. Numerical experiments are conducted to justify the theoretical improvement.
Paper Structure (20 sections, 8 theorems, 136 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 20 sections, 8 theorems, 136 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Lemma 1

In the setting of Assumption assump1, let $b(\nabla f(x,\bm{\xi}),\lambda)$ denotes the non-average of AClip, i.e., If $\|\nabla f(x)\|_2 \leq \lambda / 2$ holds, then

Figures (7)

  • Figure 1: Trajectories of the logistic regression model for the real-world data. The top and bottom rows correspond to the Diabetes and Adult datasets, respectively.
  • Figure 2: Trajectories of test accuracy for different noise levels on CIFAR-10 dataset. We set the $\delta$ to $10^{-5},$ clipping level $\lambda$ to 30, and batch size $m$ to 500. All experiments are conducted on the NVIDIA RTX A6000 platform.
  • Figure 3: Trajectories of the logistic regression model for the synthetic data. The three rows correspond to the Chi-squared distribution, Laplace distribution, and Student's t-distribution, respectively.
  • Figure 4: Trajectories of the ridge regression model for the real-world data. The top and bottom rows correspond to the Diabetes and Adult datasets, respectively.
  • Figure 5: Trajectories of the ridge regression model for the synthetic data. The three rows correspond to the Chi-squared distribution, Laplace distribution, and Student's t-distribution, respectively.
  • ...and 2 more figures

Theorems & Definitions (32)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • ...and 22 more