ANO : Faster is Better in Noisy Landscape

Adrien Kegreisz

ANO : Faster is Better in Noisy Landscape

Adrien Kegreisz

TL;DR

Ano introduces a decoupled optimization paradigm that separates update direction (via the momentum sign) from update magnitude (via instantaneous gradient magnitude scaled by a second-moment term). This yields improved robustness to gradient noise and non-stationarity while maintaining first-order efficiency; Anolog extends this with a logarithmic momentum schedule to reduce tuning burden. Theoretical analysis provides a non-convex convergence guarantee of $\tilde{\mathcal{O}}(K^{-1/4})$, aligning with other sign-based methods, and empirical results demonstrate notable gains in noisy RL and NLP tasks, with competitive performance on standard benchmarks in CV. Overall, Ano offers a practical, robust alternative to momentum-based adaptive optimizers for noisy landscapes, with broad applicability across CV, NLP, and DRL.

Abstract

Stochastic optimizers are central to deep learning, yet widely used methods such as Adam and Adan can degrade in non-stationary or noisy environments, partly due to their reliance on momentum-based magnitude estimates. We introduce Ano, a novel optimizer that decouples direction and magnitude: momentum is used for directional smoothing, while instantaneous gradient magnitudes determine step size. This design improves robustness to gradient noise while retaining the simplicity and efficiency of first-order methods. We further propose Anolog, which removes sensitivity to the momentum coefficient by expanding its window over time via a logarithmic schedule. We establish non-convex convergence guarantees with a convergence rate similar to other sign-based methods, and empirically show that Ano provides substantial gains in noisy and non-stationary regimes such as reinforcement learning, while remaining competitive on low-noise tasks.

ANO : Faster is Better in Noisy Landscape

TL;DR

Abstract

ANO : Faster is Better in Noisy Landscape

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)