Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits
Abdurakhmon Sadiev, Peter Richtárik, Ilyas Fatkhullin
TL;DR
This work initiates a theoretical framework for second-order stochastic optimization under heavy-tailed noise modeled by $p$-BCM, deriving minimax lower bounds that reveal intrinsic sample-complexity limits for SOSO. It then introduces a near-optimal second-order method (NSGD_SOM) that attains these limits in the regime $L\le\sigma_h$ without requiring Hessian Lipschitzness, and a high-probability Hessian-clipping variant (Clip NSGD_SOM) that provides robust convergence guarantees with poly-log overhead. The core mechanism to achieve robustness is clipping not only gradients but also Hessian-vector products, enabling high-probability convergence in the presence of heavy-tailed noise in both first- and second-order oracles. Collectively, the results deliver the first comprehensive sample-complexity characterization for SOSO under heavy-tailed noise and position Hessian clipping as a principled, robust design principle for second-order algorithms in such regimes, with empirical validation on synthetic heavy-tailed tasks.
Abstract
Heavy-tailed noise is pervasive in modern machine learning applications, arising from data heterogeneity, outliers, and non-stationary stochastic environments. While second-order methods can significantly accelerate convergence in light-tailed or bounded-noise settings, such algorithms are often brittle and lack guarantees under heavy-tailed noise -- precisely the regimes where robustness is most critical. In this work, we take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise. We consider a setting where stochastic gradients and Hessians have only bounded $p$-th moments, for some $p\in (1,2]$, and establish tight lower bounds on the sample complexity of any second-order method. We then develop a variant of normalized stochastic gradient descent that leverages second-order information and provably matches these lower bounds. To address the instability caused by large deviations, we introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits. Our results provide the first comprehensive sample complexity characterization for second-order optimization under heavy-tailed noise. This positions Hessian clipping as a robust and theoretically sound strategy for second-order algorithm design in heavy-tailed regimes.
