Table of Contents
Fetching ...

A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent

Shuche Wang, Vincent Y. F. Tan

TL;DR

This work addresses corruption-tolerant distributed gradient descent in the presence of adversarial gradient corruptions and channel noise. It introduces RDGD, a lazy mirror-descent-based algorithm, and extends it to strongly convex losses via RDGD-SC, with a restart variant RDGD-Restart to combine fast initial convergence with corruption amortization. Theoretical results provide convergence bounds showing how the corruption budget $C(T)$ scales the error (e.g., $O\left(\frac{1}{\sqrt{T}} + \frac{C(T)}{m\sqrt{T}}\right)$ for smooth convex and $O(\rho^t + \frac{C(t)}{m})$ or $O(\frac{1}{t^2} + \frac{C(t)}{m\sqrt{t}})$ for strongly convex, depending on stepsize). Empirical tests on least squares, L2-SVM, and MNIST demonstrate RDGD’s robustness under various adversarial scenarios, including periodic and targeted attacks, with the restart strategy accelerating convergence. Overall, the approach offers a principled way to mitigate worst-case corruptions in distributed optimization with practical implications for large-scale learning systems facing Byzantine-like faults.

Abstract

Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a novel problem in which adversarial corruptions are present in a distributed learning system. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Extensive convergence analysis for (strongly) convex loss functions is provided for different choices of the stepsize. We carefully optimize the stepsize schedule to accelerate the convergence of the algorithm, while at the same time amortizing the effect of the corruption over time. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.

A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent

TL;DR

This work addresses corruption-tolerant distributed gradient descent in the presence of adversarial gradient corruptions and channel noise. It introduces RDGD, a lazy mirror-descent-based algorithm, and extends it to strongly convex losses via RDGD-SC, with a restart variant RDGD-Restart to combine fast initial convergence with corruption amortization. Theoretical results provide convergence bounds showing how the corruption budget scales the error (e.g., for smooth convex and or for strongly convex, depending on stepsize). Empirical tests on least squares, L2-SVM, and MNIST demonstrate RDGD’s robustness under various adversarial scenarios, including periodic and targeted attacks, with the restart strategy accelerating convergence. Overall, the approach offers a principled way to mitigate worst-case corruptions in distributed optimization with practical implications for large-scale learning systems facing Byzantine-like faults.

Abstract

Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a novel problem in which adversarial corruptions are present in a distributed learning system. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Extensive convergence analysis for (strongly) convex loss functions is provided for different choices of the stepsize. We carefully optimize the stepsize schedule to accelerate the convergence of the algorithm, while at the same time amortizing the effect of the corruption over time. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
Paper Structure (32 sections, 84 equations, 12 figures, 3 algorithms)

This paper contains 32 sections, 84 equations, 12 figures, 3 algorithms.

Figures (12)

  • Figure 1: Illustration of the distributed gradient descent setting with corruption over noisy channels. The variables indicated in blue are random Gaussian noises, defined in \ref{['eq:noise_para']} and \ref{['eqn:grad_noise']}. The variables indicated in red are deterministic adversarial corruptions specified in \ref{['liar']} and subject to the bound in \ref{['eq:constraint']}.
  • Figure 2: Illustration of primal and dual updates in RDGD.
  • Figure 3: Performances of RDGD and DGD for least squares regression
  • Figure 4: Performances of different mirror maps $\psi$ for least squares regression
  • Figure 5: Comparison of the performances of RDGD-SC with different stepsizes $\{\eta_k\}$ to RDGD-Restart
  • ...and 7 more figures