A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent
Shuche Wang, Vincent Y. F. Tan
TL;DR
This work addresses corruption-tolerant distributed gradient descent in the presence of adversarial gradient corruptions and channel noise. It introduces RDGD, a lazy mirror-descent-based algorithm, and extends it to strongly convex losses via RDGD-SC, with a restart variant RDGD-Restart to combine fast initial convergence with corruption amortization. Theoretical results provide convergence bounds showing how the corruption budget $C(T)$ scales the error (e.g., $O\left(\frac{1}{\sqrt{T}} + \frac{C(T)}{m\sqrt{T}}\right)$ for smooth convex and $O(\rho^t + \frac{C(t)}{m})$ or $O(\frac{1}{t^2} + \frac{C(t)}{m\sqrt{t}})$ for strongly convex, depending on stepsize). Empirical tests on least squares, L2-SVM, and MNIST demonstrate RDGD’s robustness under various adversarial scenarios, including periodic and targeted attacks, with the restart strategy accelerating convergence. Overall, the approach offers a principled way to mitigate worst-case corruptions in distributed optimization with practical implications for large-scale learning systems facing Byzantine-like faults.
Abstract
Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a novel problem in which adversarial corruptions are present in a distributed learning system. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Extensive convergence analysis for (strongly) convex loss functions is provided for different choices of the stepsize. We carefully optimize the stepsize schedule to accelerate the convergence of the algorithm, while at the same time amortizing the effect of the corruption over time. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
