Table of Contents
Fetching ...

Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

Gilles Bareilles, Wassim Bouaziz, Julien Fageot, El-Mahdi El-Mhamdi

TL;DR

This work addresses the robustness of aggregation rules in Byzantine machine learning by introducing the optimal robustness coefficient $κ^\star$, a tight, optimization-based measure of how well an aggregator can bound the deviation from the honest mean under adversarial behavior. It provides the first formal robustness guarantees for the MultiKrum aggregator and derives both upper and lower bounds on its robustness coefficient, while also improving the best-known bounds for Krum. The results reveal that MultiKrum’s bounds are never worse than Krum’s and can be strictly better in realistic regimes, with a transition in performance governed by the number of Byzantine workers $f$ relative to the total workers $n$. The paper couples rigorous proofs (mean-variance relations and key lemmas) with experimental illustrations, offering practical insights for designing robust distributed learning systems and motivating further study of robustness coefficients for other aggregators. Overall, it advances the theoretical foundation of Byzantine-robust mean estimation and informs the choice of aggregation rules in adversarial settings.

Abstract

Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $κ^\star$, the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.

Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

TL;DR

This work addresses the robustness of aggregation rules in Byzantine machine learning by introducing the optimal robustness coefficient , a tight, optimization-based measure of how well an aggregator can bound the deviation from the honest mean under adversarial behavior. It provides the first formal robustness guarantees for the MultiKrum aggregator and derives both upper and lower bounds on its robustness coefficient, while also improving the best-known bounds for Krum. The results reveal that MultiKrum’s bounds are never worse than Krum’s and can be strictly better in realistic regimes, with a transition in performance governed by the number of Byzantine workers relative to the total workers . The paper couples rigorous proofs (mean-variance relations and key lemmas) with experimental illustrations, offering practical insights for designing robust distributed learning systems and motivating further study of robustness coefficients for other aggregators. Overall, it advances the theoretical foundation of Byzantine-robust mean estimation and informs the choice of aggregation rules in adversarial settings.

Abstract

Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce , the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.
Paper Structure (22 sections, 13 theorems, 72 equations, 2 figures, 1 table)

This paper contains 22 sections, 13 theorems, 72 equations, 2 figures, 1 table.

Key Result

Proposition 1

Consider a robust aggregation rule $F$i.e., a rule such that $\kappa^\star < \infty$. Then, $n-2f > 0$, and there holds $\kappa^\star \ge \frac{f}{n-2f}.$

Figures (2)

  • Figure 1: Illustration of the upper and lower bounds on $\kappa^\star_m$, for $n=100$, $f=10$, and $m$ varying between $1$ and $n-f=90$.
  • Figure 2: Illustration of $m^\dagger(n, f) / n$, where $m^\dagger(n, f)$ denotes the smallest $m$ such that $(\sqrt{2}+1)^2>\kappa_2^>(m)$, and of the bound $(1 + \sqrt{2})^{-2}$ provided by \ref{['prop:mstar']}.

Theorems & Definitions (28)

  • Definition 1
  • Remark 1
  • Definition 2
  • Proposition 1: Universal lower bound
  • Theorem 1
  • Theorem 2: Bounds on the transition point
  • proof : Proof of \ref{['prop:mstar']}
  • Theorem 3: Krum's lower bound
  • proof : Proof of \ref{['th:lowerbound-krum']}
  • Theorem 4: $(n-f)$-MultiKrum's lower bound
  • ...and 18 more