Table of Contents
Fetching ...

SignSGD with Federated Voting

Chanho Park, H. Vincent Poor, Namyoon Lee

TL;DR

The proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes, and a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly is provided.

Abstract

Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization. However, due to heterogeneous computational capabilities, it fails to converge when the mini-batch sizes differ among workers. To overcome this, we propose a novel signSGD optimizer with \textit{federated voting} (signSGD-FV). The idea of federated voting is to exploit learnable weights to perform weighted majority voting. The server learns the weights assigned to the edge devices in an online fashion based on their computational capabilities. Subsequently, these weights are employed to decode the signs of the aggregated local gradients in such a way to minimize the sign decoding error probability. We provide a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly. We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes. Experimental results show that signSGD-FV outperforms signSGD-MV, exhibiting a faster convergence rate, especially in heterogeneous mini-batch sizes.

SignSGD with Federated Voting

TL;DR

The proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes, and a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly is provided.

Abstract

Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization. However, due to heterogeneous computational capabilities, it fails to converge when the mini-batch sizes differ among workers. To overcome this, we propose a novel signSGD optimizer with \textit{federated voting} (signSGD-FV). The idea of federated voting is to exploit learnable weights to perform weighted majority voting. The server learns the weights assigned to the edge devices in an online fashion based on their computational capabilities. Subsequently, these weights are employed to decode the signs of the aggregated local gradients in such a way to minimize the sign decoding error probability. We provide a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly. We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes. Experimental results show that signSGD-FV outperforms signSGD-MV, exhibiting a faster convergence rate, especially in heterogeneous mini-batch sizes.
Paper Structure (26 sections, 7 theorems, 57 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 7 theorems, 57 equations, 6 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $A_n^t:\{-1,+1\}^M\rightarrow \{-1,+1\}$ be a binary sign aggregation function applied to the $n$th gradient component at iteration $t$. This binary sign aggregation function produces an estimate of the true gradient sign $U_n^t$, i.e., Using the estimated gradient sign $\hat{U}_n^t$, the maximum of the sign decoding error probability over all coordinates and iterations is denoted by Then, w

Figures (6)

  • Figure 1: An illustration of signSGD-FV.
  • Figure 2: The coding-theoretic interpretation of signSGD-FV.
  • Figure 3: Test accuracy vs. training rounds varying the batch mode with $M = 15$ and $T_\mathsf{in} = 100$.
  • Figure 4: Test accuracy vs. training rounds on signSGD-FV varying the number of workers where the batch mode is 3 and $T_\mathsf{in} = 100$.
  • Figure 5: Test accuracy comparison on CIFAR-10 dataset by varying the uncertainty of estimated LLR weights for the batch mode 3 and $M = 15$.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Theorem 1: Universal convergence rate
  • proof
  • Lemma 1: Large deviation bound
  • proof
  • Theorem 2: Decoding error bound of WMV aggregation
  • proof
  • Corollary 1
  • Lemma 2: Upper bound on the computing error probability
  • proof
  • Corollary 2: Decoding error bound with mini-batch sizes
  • ...and 9 more