Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

Ahmad Rammal; Kaja Gruntkowska; Nikita Fedin; Eduard Gorbunov; Peter Richtárik

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik

TL;DR

A new Byzantine-robust method with compression is proposed - Byz-DASHA-PAGE - and it is proved that the new method has better convergence rate, smaller neighborhood size in the heterogeneous case, and tolerates more Byzantine workers under over-parametrization than the previous method with SOTA theoretical convergence guarantees.

Abstract

Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning. These problems are usually huge-scale, implying that communication compression is also imperative for their resolution. These factors have spurred recent algorithmic and theoretical developments in the literature of Byzantine-robust learning with compression. In this paper, we contribute to this research area in two main directions. First, we propose a new Byzantine-robust method with compression - Byz-DASHA-PAGE - and prove that the new method has better convergence rate (for non-convex and Polyak-Lojasiewicz smooth optimization problems), smaller neighborhood size in the heterogeneous case, and tolerates more Byzantine workers under over-parametrization than the previous method with SOTA theoretical convergence guarantees (Byz-VR-MARINA). Secondly, we develop the first Byzantine-robust method with communication compression and error feedback - Byz-EF21 - along with its bidirectional compression version - Byz-EF21-BC - and derive the convergence rates for these methods for non-convex and Polyak-Lojasiewicz smooth case. We test the proposed methods and illustrate our theoretical findings in the numerical experiments.

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

TL;DR

Abstract

Paper Structure (51 sections, 31 theorems, 128 equations, 10 figures, 3 tables, 5 algorithms)

This paper contains 51 sections, 31 theorems, 128 equations, 10 figures, 3 tables, 5 algorithms.

INTRODUCTION
Technical Preliminaries
Robust aggregation.
Communication compression.
Assumptions.
Our Contributions
Related Work on Byzantine Robustness
METHODS WITH UNBIASED COMPRESSION
Warm-up: Byz-VR-MARINA 2.0
Byz-DASHA-PAGE
METHODS WITH BIASED COMPRESSION AND ERROR FEEDBACK
NUMERICAL EXPERIMENTS
Non-convex homogeneous setting.
Non-convex heterogeneous setting.
EXTRA RELATED WORK
...and 36 more sections

Key Result

Theorem 2.1

Let Assumptions as:smoothness, as:hessian_variance, as:hessian_variance_local and as:bounded_heterogeneity hold. Assume that $0 < \gamma \leq (L + \sqrt{\eta})^{-1}$, $\delta < \left( (8c + 4\sqrt{c})B \right)^{-1}$ and initialize $g_i^0 = \nabla f_i(x^0)$ for all $i\in {\cal G}$, where $\eta = \fra where $\delta^0=f(x^0) - f^*$, $A = 1 - \left( 8c\delta+\sqrt{8c\delta/G} \right)B$ and $\widehat{x

Figures (10)

Figure 1: Convergence in terms of the number of iterations in the homogeneous non-convex setting.
Figure 2: Convergence in terms of the number of bits sent in the heterogeneous non-convex setting.
Figure 3: Communication complexity comparison in the heterogeneous non-convex setting on the w8a dataset.
Figure 4: Communication complexity comparison in the heterogeneous strongly convex setting on the phishing dataset.
Figure 5: Communication complexity comparison in the heterogeneous strongly convex setting on the w8a dataset.
...and 5 more figures

Theorems & Definitions (57)

Definition 1.1: $(\delta, c)$-Robust Aggregator
Definition 1.2: Unbiased compressor
Definition 1.3: Contractive compressor
Theorem 2.1
Theorem 2.2
Theorem 3.1
Lemma C.1: Lemma $2$ of li2021page
Lemma C.2: Lemma $5$ of richtarik2021ef21
Lemma E.1: Bound on the variance
proof
...and 47 more

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

TL;DR

Abstract

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (57)