Table of Contents
Fetching ...

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks

Gongyue Zhang, Honghai Liu

TL;DR

The paper tackles unifying SGD and adaptive optimizers within a first-order optimization framework, addressing gradient vanishing/exploding and dataset sparsity in deep networks. It introduces Second-Moment Exponential Scaling (SMES) with a tunable exponent $\alpha$ in the update $\theta_{t+1}=\theta_t-\frac{\eta}{(\hat{v_t})^{\alpha}+\epsilon}\cdot\hat{m_t}$ to interpolate between SGD ($\alpha=0$) and Adam ($\alpha=0.5$). Key contributions include a balance theory that explains training tendencies, a method to externally estimate and adjust gradient distortion, and empirical validation showing how different balance coefficients affect fitting and generalization on standard vision benchmarks. This yields a more interpretable optimization dynamic and offers practical guidance for selecting optimizer behavior without changing network architecture.

Abstract

We have identified a potential method for unifying first-order optimizers through the use of variable Second-Moment Exponential Scaling(SMES). We begin with back propagation, addressing classic phenomena such as gradient vanishing and explosion, as well as issues related to dataset sparsity, and introduce the theory of balance in optimization. Through this theory, we suggest that SGD and adaptive optimizers can be unified under a broader inference, employing variable moving exponential scaling to achieve a balanced approach within a generalized formula for first-order optimizers. We conducted tests on some classic datasets and networks to confirm the impact of different balance coefficients on the overall training process.

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks

TL;DR

The paper tackles unifying SGD and adaptive optimizers within a first-order optimization framework, addressing gradient vanishing/exploding and dataset sparsity in deep networks. It introduces Second-Moment Exponential Scaling (SMES) with a tunable exponent in the update to interpolate between SGD () and Adam (). Key contributions include a balance theory that explains training tendencies, a method to externally estimate and adjust gradient distortion, and empirical validation showing how different balance coefficients affect fitting and generalization on standard vision benchmarks. This yields a more interpretable optimization dynamic and offers practical guidance for selecting optimizer behavior without changing network architecture.

Abstract

We have identified a potential method for unifying first-order optimizers through the use of variable Second-Moment Exponential Scaling(SMES). We begin with back propagation, addressing classic phenomena such as gradient vanishing and explosion, as well as issues related to dataset sparsity, and introduce the theory of balance in optimization. Through this theory, we suggest that SGD and adaptive optimizers can be unified under a broader inference, employing variable moving exponential scaling to achieve a balanced approach within a generalized formula for first-order optimizers. We conducted tests on some classic datasets and networks to confirm the impact of different balance coefficients on the overall training process.
Paper Structure (17 sections, 8 equations)