Private Stochastic Convex Optimization with Heavy Tails: Near-Optimality from Simple Reductions
Hilal Asi, Daogao Liu, Kevin Tian
TL;DR
This work addresses DP-SCO under heavy-tailed gradient distributions by replacing uniform Lipschitz assumptions with $k$-th moment bounds and introducing a reduction-based, localization-driven framework. The authors develop a DP-ERM solver based on clipped-DP-SGD, establish population-level localization, and compose these to obtain near-optimal private-sCO rates that match known lower bounds up to polylog factors, with enhanced results under known Lipschitz constants and for smooth objective classes. They also present fast, linear-time algorithms for smooth settings, employing stability analyses and the sparse vector technique to ensure privacy. A key specialization to generalized linear models yields optimal rates with linear gradient-query complexity, highlighting practical efficiency and broad applicability. Overall, the paper advances private optimization under heavy tails by unifying reductions, localization, and efficient private-SGD-like methods to achieve near-optimal guarantees.
Abstract
We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a $k^{\text{th}}$-moment bound on the Lipschitz constants of sample functions rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error $G_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{nε})^{1 - \frac 1 k}$ under $(ε, δ)$-approximate differential privacy, up to a mild $\textup{polylog}(\frac{1}δ)$ factor, where $G_2^2$ and $G_k^k$ are the $2^{\text{nd}}$ and $k^{\text{th}}$ moment bounds on sample Lipschitz constants, nearly-matching a lower bound of [Lowy and Razaviyayn 2023]. We further give a suite of private algorithms in the heavy-tailed setting which improve upon our basic result under additional assumptions, including an optimal algorithm under a known-Lipschitz constant assumption, a near-linear time algorithm for smooth functions, and an optimal linear time algorithm for smooth generalized linear models.
