Gradient Descent Algorithm Survey
Deng Fucheng, Wang Wanjie, Gong Ao, Wang Xiaoqi, Wang Fan
TL;DR
The paper addresses the challenge of selecting and tuning optimizers for deep learning by presenting a systematic, empirically grounded comparison of SGD, Mini-batch SGD, Momentum, Adam, and Lion. It combines theoretical context (learning-rate schedules, convergence, and averaging) with extensive experiments on MNIST and California Housing to elucidate each method's strengths and failure modes. Its key contributions include detailed algorithmic descriptions, practical engineering guidance, and clear recommendations for when to use each optimizer (e.g., Lion and Momentum for large-scale/noisy tasks, SGD as a stable baseline, and Adam variants with cautions about generalization). The work has practical impact by offering a standardized framework for optimizer selection and tuning across task types and hardware configurations, aiding researchers and engineers in achieving faster convergence and better generalization in real-world settings.
Abstract
Focusing on the practical configuration needs of optimization algorithms in deep learning, this article concentrates on five major algorithms: SGD, Mini-batch SGD, Momentum, Adam, and Lion. It systematically analyzes the core advantages, limitations, and key practical recommendations of each algorithm. The research aims to gain an in-depth understanding of these algorithms and provide a standardized reference for the reasonable selection, parameter tuning, and performance improvement of optimization algorithms in both academic research and engineering practice, helping to solve optimization challenges in different scales of models and various training scenarios.
