HVAdam: A Full-Dimension Adaptive Optimizer
Yiheng Zhang, Shaowu Wu, Yuanzhuo Xu, Jiajun Wu, Shang Xu, Steve Drew, Xiaoguang Niu
TL;DR
The paper identifies adaptivity in pre-conditioners as a key factor limiting generalization for adaptive optimizers in valley-like landscapes. It proposes HVAdam, a full-dimension adaptive optimizer that uses a hidden vector to capture invariant gradient trends, plus a restart strategy and a noise-aware preconditioning scheme. The authors provide convergence guarantees for both convex and non-convex settings and demonstrate substantial empirical improvements across image classification, NLP, and GAN tasks. The results suggest adaptivity can be tuned to bridge classical SGD and Adam behavior, offering a unified framework that outperforms existing optimizers.
Abstract
Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity , allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to gradient noise. We theoretically establish convergence guarantees under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers on representative image classification, diffusion, and language modeling tasks. These results demonstrate that adaptivity can serve as a valuable tunable design principle, and Anon provides the first unified and reliable framework capable of bridging the gap between classical and modern optimizers and surpassing their advantageous properties.
