Table of Contents
Fetching ...

Dynamic Momentum Recalibration in Online Gradient Learning

Zhipeng Yao, Rui Yu, Guisong Chang, Ying Li, Yu Zhang, Dazhou Li

TL;DR

This work reinterprets gradient updates through the lens of signal processing and reveals that fixed momentum coefficients inherently distort the balance between bias and variance, leading to skewed or suboptimal parameter updates.

Abstract

Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient updates through the lens of signal processing and reveal that fixed momentum coefficients inherently distort the balance between bias and variance, leading to skewed or suboptimal parameter updates. To address this, we propose SGDF (SGD with Filter), an optimizer inspired by the principles of Optimal Linear Filtering. SGDF computes an online, time-varying gain to dynamically refine gradient estimation by minimizing the mean-squared error, thereby achieving an optimal trade-off between noise suppression and signal preservation. Furthermore, our approach could extend to other optimizers, showcasing its broad applicability to optimization frameworks. Extensive experiments across diverse architectures and benchmarks demonstrate SGDF surpasses conventional momentum methods and achieves performance on par with or surpassing state-of-the-art optimizers.

Dynamic Momentum Recalibration in Online Gradient Learning

TL;DR

This work reinterprets gradient updates through the lens of signal processing and reveals that fixed momentum coefficients inherently distort the balance between bias and variance, leading to skewed or suboptimal parameter updates.

Abstract

Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient updates through the lens of signal processing and reveal that fixed momentum coefficients inherently distort the balance between bias and variance, leading to skewed or suboptimal parameter updates. To address this, we propose SGDF (SGD with Filter), an optimizer inspired by the principles of Optimal Linear Filtering. SGDF computes an online, time-varying gain to dynamically refine gradient estimation by minimizing the mean-squared error, thereby achieving an optimal trade-off between noise suppression and signal preservation. Furthermore, our approach could extend to other optimizers, showcasing its broad applicability to optimization frameworks. Extensive experiments across diverse architectures and benchmarks demonstrate SGDF surpasses conventional momentum methods and achieves performance on par with or surpassing state-of-the-art optimizers.
Paper Structure (42 sections, 15 theorems, 155 equations, 13 figures, 17 tables, 1 algorithm)

This paper contains 42 sections, 15 theorems, 155 equations, 13 figures, 17 tables, 1 algorithm.

Key Result

Lemma 2.2

For any gradient estimator $\hat{g}_t = \mathcal{A}(g_1,...,g_t)$, the estimation of the mean square error decomposes as:

Figures (13)

  • Figure 1: Test accuracy ([$\mu \pm \sigma$]) on CIFAR.
  • Figure 2: Convergence comparison between Sign SGDF and Adam.
  • Figure 3: Histogram of Top 50 Hessian Eigenvalues. Lower values indicate better performance on the test dataset.
  • Figure 4: Training (top row) and test (bottom row) accuracy of CNNs on CIFAR-10 dataset. We report confidence interval ([$\mu \pm \sigma$]) of 3 independent runs.
  • Figure 5: Training (top row) and test (bottom row) accuracy of CNNs on CIFAR-100 dataset. We report confidence interval ([$\mu \pm \sigma$]) of 3 independent runs.
  • ...and 8 more figures

Theorems & Definitions (32)

  • Definition 2.1
  • Lemma 2.2
  • Theorem 2.3
  • Theorem 3.1: Convergence in Convex Optimization
  • Theorem 3.2
  • Definition A.1
  • Lemma A.3: Bias-Variance Decomposition
  • proof
  • Lemma A.4
  • proof
  • ...and 22 more