New logarithmic step size for stochastic gradient descent

M. Soheil Shamaee; S. Fathi Hafshejani; Z. Saeidian

New logarithmic step size for stochastic gradient descent

M. Soheil Shamaee, S. Fathi Hafshejani, Z. Saeidian

TL;DR

The paper addresses SGD step-size design in the context of non-convex objectives by introducing a novel logarithmic step size with warm restarts, defined as $\eta_t=\eta_0\left(1-\frac{\ln t}{\ln T}\right)$. The authors prove an $O\left(\frac{1}{\sqrt{T}}\right)$ convergence rate under smoothness assumptions when $c\propto\frac{\sqrt{T}}{\ln T}$ and provide a two-loop warm-restart algorithm to implement this schedule. Empirically, they compare against nine baselines on FashionMNIST, CIFAR10, and CIFAR100, reporting competitive training performance and a $0.9$ percentage-point improvement in test accuracy for CIFAR100 with CNN. The work highlights that the new step size preserves more probability mass for later iterations, improving final-output selection and overall effectiveness in deep learning benchmarks. These results suggest practical benefits for SGD-based training, particularly in deep CNNs on standard image datasets.

Abstract

In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an $O(\frac{1}{\sqrt{T}})$ convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the ~FashionMinst,~ CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by $0.9\%$ for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.

New logarithmic step size for stochastic gradient descent

TL;DR

The paper addresses SGD step-size design in the context of non-convex objectives by introducing a novel logarithmic step size with warm restarts, defined as

. The authors prove an

convergence rate under smoothness assumptions when

and provide a two-loop warm-restart algorithm to implement this schedule. Empirically, they compare against nine baselines on FashionMNIST, CIFAR10, and CIFAR100, reporting competitive training performance and a

percentage-point improvement in test accuracy for CIFAR100 with CNN. The work highlights that the new step size preserves more probability mass for later iterations, improving final-output selection and overall effectiveness in deep learning benchmarks. These results suggest practical benefits for SGD-based training, particularly in deep CNNs on standard image datasets.

Abstract

In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an

convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the ~FashionMinst,~ CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by

for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.

Paper Structure (11 sections, 7 theorems, 28 equations, 5 figures, 6 tables)

This paper contains 11 sections, 7 theorems, 28 equations, 5 figures, 6 tables.

Introduction
Contribution
Problem Set-Up
The New Step Size
Algorithm and convergence
Convergence results for smoothness function
Numerical Results
Experiments on MNIST, CIFAR10 and CIFAR100
Methods
Results and Discussion
Conclusion

Key Result

Lemma 2.1

For the function $\ln (x)$, we have:

Figures (5)

Figure 1: Left: Representation of the value $\frac{\eta_t}{\sum_{t=1}^T\eta_t}$ for both the new step size and cosine step size in li2021second. Middle: Warm restarts simulated every $T = 30$ (blue line), $T =70$ (orange line), $T=100$ (green line), and $T=200$ (magenta line) epochs with $\eta_0=1$. Right: The comparison of the warm restart Algorithm \ref{['alg1']} with the new proposed step size and cosine step size on the FashionMnist dataset.
Figure 2: Comparison of new proposed step size and five other step sizes on FashionMinst, CIFAR10, and CIFAR100 datasets.
Figure 3: Comparison of new proposed step size and four other step sizes on FashionMinst, CIFAR10, and CIFAR100 datasets.
Figure 4: Comparison of new proposed step size and five other step sizes on CIFAR100 dataset using the DenseNet-BC model.
Figure 5: Comparison of new proposed step size and four other step sizes on CIFAR100 dataset using the DenseNet-BC model.

Theorems & Definitions (14)

Definition 2.1
Lemma 2.1
Lemma 3.1
Lemma 3.2
Lemma 3.3
Theorem 3.1
proof
Corollary 3.1
Corollary 3.2
proof
...and 4 more

New logarithmic step size for stochastic gradient descent

TL;DR

Abstract

New logarithmic step size for stochastic gradient descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)