Table of Contents
Fetching ...

AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate

Meng Zhu, Quan Xiao, Weidong Min

TL;DR

AdamNX introduces a novel exponential decay for the second-order moment estimate, making the second-moment correction progressively weaker and driving the optimizer toward momentum SGD during training plateaus. By defining a time-varying $\hat{β}_{2,t}$ that converges to 1, the method reduces jitter in stable phases while preserving early-stage benefits of Adam. Empirical results across image classification, object detection, and semantic segmentation show consistent improvements over Adam and AdaX, with notable gains on multiple backbones and tasks. The work offers a practical, open-source optimization alternative that blends rapid early optimization with stable, generalizable late-stage convergence.

Abstract

Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global minima. Entering the era of large language models, although the scale of model parameters and data has increased, Adam remains the mainstream optimization algorithm. However, compared with stochastic gradient descent (SGD) based optimization algorithms, Adam is more likely to converge to non-flat minima. To address this issue, the AdamNX algorithm is proposed. Its core innovation lies in the proposition of a novel type of second-order moment estimation exponential decay rate, which gradually weakens the learning step correction strength as training progresses, and degrades to momentum SGD in the stable training period, thereby improving the stability of training in the stable period and possibly enhancing generalization ability. Experimental results show that our second-order moment estimation exponential decay rate is better than the current second-order moment estimation exponential decay rate, and AdamNX can stably outperform Adam and its variants in terms of performance. Our code is open-sourced at https://github.com/mengzhu0308/AdamNX.

AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate

TL;DR

AdamNX introduces a novel exponential decay for the second-order moment estimate, making the second-moment correction progressively weaker and driving the optimizer toward momentum SGD during training plateaus. By defining a time-varying that converges to 1, the method reduces jitter in stable phases while preserving early-stage benefits of Adam. Empirical results across image classification, object detection, and semantic segmentation show consistent improvements over Adam and AdaX, with notable gains on multiple backbones and tasks. The work offers a practical, open-source optimization alternative that blends rapid early optimization with stable, generalizable late-stage convergence.

Abstract

Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global minima. Entering the era of large language models, although the scale of model parameters and data has increased, Adam remains the mainstream optimization algorithm. However, compared with stochastic gradient descent (SGD) based optimization algorithms, Adam is more likely to converge to non-flat minima. To address this issue, the AdamNX algorithm is proposed. Its core innovation lies in the proposition of a novel type of second-order moment estimation exponential decay rate, which gradually weakens the learning step correction strength as training progresses, and degrades to momentum SGD in the stable training period, thereby improving the stability of training in the stable period and possibly enhancing generalization ability. Experimental results show that our second-order moment estimation exponential decay rate is better than the current second-order moment estimation exponential decay rate, and AdamNX can stably outperform Adam and its variants in terms of performance. Our code is open-sourced at https://github.com/mengzhu0308/AdamNX.

Paper Structure

This paper contains 17 sections, 19 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Functional curves of $\overset{\frown}{\beta}_{1,t}$ and $\overset{\frown}{\beta}_{2,t}$ versus iteration number $t$ in AdamNX
  • Figure 2: Comparison between AdamNX's $\overset{\frown}{\beta}_{2,t}$ and Adam's $\overset{\frown}{\beta}_{2,t}$
  • Figure 3: Functional curve between the learning rate and the number of iterations
  • Figure 4: Different second-order moment estimate exponential decay rates
  • Figure 5: Training loss vs. iteration (experimental logs from Table \ref{['tbl-som-decay-rate-abalation-study-cifar100-SwinV2-S']})
  • ...and 1 more figures