CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching
Wenzhang Du
TL;DR
CAO addresses slow optimization in sharp, anisotropic regions by introducing curvature_adaptive preconditioning based on a periodically refreshed, low_rank Hessian sketch. The method builds a rank_k sketch B from Hessian_vector products and applies the damped inverse P = (B + eta I)^{-1} to the gradient, updating theta <- theta - alpha P g and reusing B between refreshes. Theoretically, CAO achieves the standard O(1/T) stationarity rate under L_smoothness with a widened stepsize range, and exhibits contraction at refresh times under a PL_condition with bounded residual curvature. Empirically, on CIFAR-10/100 with ResNet-18/34, CAO with k = 1 reaches a 0.75 train_loss epoch threshold 2.95x faster than Adam while matching final test accuracy, with robustness to the exact rank choice. Reproducibility is ensured by anonymized logs and scripts to regenerate figures and tables.
Abstract
First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that subspace, leaving the orthogonal complement first-order. For L-smooth non-convex objectives, we recover the standard O(1/T) stationarity guarantee with a widened stable stepsize range; under a Polyak--Lojasiewicz (PL) condition with bounded residual curvature outside the sketch, the loss contracts at refresh steps. On CIFAR-10/100 with ResNet-18/34, the method enters the low-loss region substantially earlier: measured by epochs to a pre-declared train-loss threshold (0.75), it reaches the threshold 2.95x faster than Adam on CIFAR-100/ResNet-18, while matching final test accuracy. The approach is one-knob: performance is insensitive to the sketch rank k across {1,3,5}, and k=0 yields a principled curvature-free ablation. We release anonymized logs and scripts that regenerate all figures and tables.
