Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
Gilles Bareilles, Wassim Bouaziz, Julien Fageot, El-Mahdi El-Mhamdi
TL;DR
This work addresses the robustness of aggregation rules in Byzantine machine learning by introducing the optimal robustness coefficient $κ^\star$, a tight, optimization-based measure of how well an aggregator can bound the deviation from the honest mean under adversarial behavior. It provides the first formal robustness guarantees for the MultiKrum aggregator and derives both upper and lower bounds on its robustness coefficient, while also improving the best-known bounds for Krum. The results reveal that MultiKrum’s bounds are never worse than Krum’s and can be strictly better in realistic regimes, with a transition in performance governed by the number of Byzantine workers $f$ relative to the total workers $n$. The paper couples rigorous proofs (mean-variance relations and key lemmas) with experimental illustrations, offering practical insights for designing robust distributed learning systems and motivating further study of robustness coefficients for other aggregators. Overall, it advances the theoretical foundation of Byzantine-robust mean estimation and informs the choice of aggregation rules in adversarial settings.
Abstract
Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $κ^\star$, the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.
