Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Wei Jiang; Sifan Yang; Wenhao Yang; Lijun Zhang

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Wei Jiang, Sifan Yang, Wenhao Yang, Lijun Zhang

TL;DR

This work addresses non-convex stochastic optimization under sign-based updates by introducing Sign-based Stochastic Variance Reduction (SSVR), which combines variance-reduced gradient estimators with sign-based updates. The proposed approach achieves a faster convergence rate of $O(d^{1/2}T^{-1/3})$ for general non-convex objectives and, for finite-sum problems, $O(m^{1/4}d^{1/2}T^{-1/2})$, surpassing prior sign-based methods. In distributed settings, the authors extend this to SSVR-MV with majority vote, obtaining rates of $O(d^{1/2}T^{-1/2} + dn^{-1/2})$ and $O(d^{1/4}T^{-1/4})$ under heterogeneous data distributions. The empirical results on CIFAR-10/100 corroborate the theoretical gains, showing improved convergence and accuracy with 1-bit communication, highlighting practical potential for scalable, communication-efficient distributed learning.

Abstract

Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{O}(d^{1/2}T^{-1/4})$, where $d$ represents the dimension and $T$ is the iteration number. In this paper, we improve this convergence rate to $\mathcal{O}(d^{1/2}T^{-1/3})$ by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of $\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$, where $m$ denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of $\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$ and $\mathcal{O}(d^{1/4}T^{-1/4})$ respectively, outperforming the previous results of $\mathcal{O}(dT^{-1/4} + dn^{-1/2})$ and $\mathcal{O}(d^{3/8}T^{-1/8})$, where $n$ represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

TL;DR

for general non-convex objectives and, for finite-sum problems,

, surpassing prior sign-based methods. In distributed settings, the authors extend this to SSVR-MV with majority vote, obtaining rates of

and

under heterogeneous data distributions. The empirical results on CIFAR-10/100 corroborate the theoretical gains, showing improved convergence and accuracy with 1-bit communication, highlighting practical potential for scalable, communication-efficient distributed learning.

Abstract

, where

represents the dimension and

is the iteration number. In this paper, we improve this convergence rate to

by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of

, where

denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of

and

respectively, outperforming the previous results of

and

, where

represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.

Paper Structure (24 sections, 14 theorems, 113 equations, 5 figures, 2 tables, 3 algorithms)

This paper contains 24 sections, 14 theorems, 113 equations, 5 figures, 2 tables, 3 algorithms.

Introduction
Related work
SignSGD and its variants
Stochastic variance reduction methods
The proposed methods
Sign-based stochastic variance reduction
Sign-based stochastic variance reduction for finite-sum structure
Sign-based stochastic variance reduction with majority vote
Experiments
Evaluation of SSVR and SSVR-FS methods in the centralized environment
Evaluation of SSVR-MV method in the distributed learning
Conclusion
Proof of Theorem \ref{['thm1']}
Proof of Theorem \ref{['thorem_2']}
Proof of Theorem \ref{['thm3']}
...and 9 more sections

Key Result

Theorem 1

Under Assumptions ass:2 and ass:3, by setting $\beta = \mathcal{O}(\frac{1}{T^{2/3}})$, $\eta = \mathcal{O}(\frac{1}{d^{1/2} T^{2/3}})$, $B_0 = \mathcal{O}(T^{1/3})$, and $B_1=\mathcal{O}(1)$, our SSVR method ensures:

Figures (5)

Figure 1: Results for CIFAR-10 dataset in the centralized environment.
Figure 2: Results for CIFAR-100 dataset in the distributed environment.
Figure 3: Results for CIFAR-10 dataset with different learning rates.
Figure 4: Results for CIFAR-10 dataset with different $\beta$.
Figure 5: Results for CIFAR-10 dataset with different batch sizes.

Theorems & Definitions (15)

Theorem 1
Theorem 2
Definition 1
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Lemma 1
Lemma 2
Lemma 3
...and 5 more

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

TL;DR

Abstract

Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (15)