Table of Contents
Fetching ...

GeoClip: Geometry-Aware Clipping for Differentially Private SGD

Atefeh Gilani, Naima Tasnim, Lalitha Sankar, Oliver Kosut

TL;DR

GeoClip tackles the DP-SGD clipping dilemma by projecting gradients into a geometry-aware basis using a transformation $M_t$ and shift $a_t$, so that noise is added along directions that preserve more utility. The authors provide a convergence guarantee and derive a closed-form solution for the optimal transformation, plus two practical estimation approaches (full-covariance moving average and streaming rank-$k$ PCA) that reuse privatized gradients without extra privacy cost. Empirically, GeoClip outperforms AdaClip, quantile-based clipping, and standard DP-SGD on synthetic, tabular, and image tasks under the same privacy budget, including transfer-learning fine-tuning and low-rank PCA variants to scale to high-dimensional settings. This framework offers faster convergence, reduced variance, and improved privacy-utility trade-offs, with practical scalability via low-rank approximations. Overall, GeoClip advances geometry-aware adaptive clipping for DP-SGD and demonstrates meaningful gains for privacy-sensitive machine learning applications.

Abstract

Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is setting the per-sample gradient clipping threshold, which significantly affects the trade-off between privacy and utility. While recent adaptive methods improve performance by adjusting this threshold during training, they operate in the standard coordinate system and fail to account for correlations across the coordinates of the gradient. We propose GeoClip, a geometry-aware framework that clips and perturbs gradients in a transformed basis aligned with the geometry of the gradient distribution. GeoClip adaptively estimates this transformation using only previously released noisy gradients, incurring no additional privacy cost. We provide convergence guarantees for GeoClip and derive a closed-form solution for the optimal transformation that minimizes the amount of noise added while keeping the probability of gradient clipping under control. Experiments on both tabular and image datasets demonstrate that GeoClip consistently outperforms existing adaptive clipping methods under the same privacy budget.

GeoClip: Geometry-Aware Clipping for Differentially Private SGD

TL;DR

GeoClip tackles the DP-SGD clipping dilemma by projecting gradients into a geometry-aware basis using a transformation and shift , so that noise is added along directions that preserve more utility. The authors provide a convergence guarantee and derive a closed-form solution for the optimal transformation, plus two practical estimation approaches (full-covariance moving average and streaming rank- PCA) that reuse privatized gradients without extra privacy cost. Empirically, GeoClip outperforms AdaClip, quantile-based clipping, and standard DP-SGD on synthetic, tabular, and image tasks under the same privacy budget, including transfer-learning fine-tuning and low-rank PCA variants to scale to high-dimensional settings. This framework offers faster convergence, reduced variance, and improved privacy-utility trade-offs, with practical scalability via low-rank approximations. Overall, GeoClip advances geometry-aware adaptive clipping for DP-SGD and demonstrates meaningful gains for privacy-sensitive machine learning applications.

Abstract

Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is setting the per-sample gradient clipping threshold, which significantly affects the trade-off between privacy and utility. While recent adaptive methods improve performance by adjusting this threshold during training, they operate in the standard coordinate system and fail to account for correlations across the coordinates of the gradient. We propose GeoClip, a geometry-aware framework that clips and perturbs gradients in a transformed basis aligned with the geometry of the gradient distribution. GeoClip adaptively estimates this transformation using only previously released noisy gradients, incurring no additional privacy cost. We provide convergence guarantees for GeoClip and derive a closed-form solution for the optimal transformation that minimizes the amount of noise added while keeping the probability of gradient clipping under control. Experiments on both tabular and image datasets demonstrate that GeoClip consistently outperforms existing adaptive clipping methods under the same privacy budget.

Paper Structure

This paper contains 17 sections, 3 theorems, 55 equations, 3 figures, 7 tables, 4 algorithms.

Key Result

Theorem 1

Assume $f$ has an $L$-Lipschitz continuous gradient. Further, assume the stochastic gradients are bounded, i.e., $\|\nabla f_k(\theta)\| \leq G$, and have bounded variance, i.e., $\mathbb{E}_k \|\nabla f_k(\theta) - \nabla f(\theta)\|^2 \leq \sigma_g^2$. Let $\theta^* = \arg\min_{\theta \in \mathbb{ where $\theta^t = (\theta_0, \dots, \theta_t)$ represents the history of parameter values up to ite

Figures (3)

  • Figure 1: GeoClip results for the synthetic Gaussian dataset with 10 features. The left plot shows the average test MSE for each method over 10 epochs, with shaded regions representing the standard deviation across 20 random seeds. We observe that GeoClip achieves the fastest convergence and lowest average test MSE. The right plot shows the overall privacy budget $\varepsilon$ expended for $\delta = 10^{-5}$. This plot applies to all four algorithms, as they are tuned to achieve the same privacy level for a given number of epochs.
  • Figure 2: The left panel shows results on the synthetic Gaussian dataset with 400 features using a rank-50 PCA approximation for GeoClip. The plot displays average test accuracy $(\%)$ over 80 iterations with a batch size of 1024. GeoClip achieves the fastest convergence and highest average accuracy. The right panel shows results on the USPS dataset using a rank-100 approximation over 25 iterations with a batch size of 1024, where a similar convergence trend is observed. Shaded regions represent standard deviation across 20 random seeds.
  • Figure 3: The left plot shows the overall privacy budget $\varepsilon$ spent on training the synthetic Gaussian dataset with 400 features for $\delta = 10^{-5}$, while the middle plot shows the same for the USPS dataset with $\delta = 10^{-5}$. These plots apply to all four algorithms, which are tuned to achieve the same privacy level for a given number of iterations. The right panel compares GeoClip’s performance using the full-covariance and low-rank approximation algorithms.

Theorems & Definitions (6)

  • Remark 1
  • Theorem 1: Convergence of GeoClip
  • Theorem 2
  • Remark 2
  • Theorem : Convergence of GeoClip
  • proof