Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers
Yuhao Yi, Ronghui You, Hong Liu, Changxin Liu, Yuan Wang, Jiancheng Lv
TL;DR
The paper addresses Byzantine faults in distributed learning by casting robust aggregation as $1$-center and $1$-mean clustering with outliers. It introduces $2$-approximation aggregators CenterwO and MeanwO to achieve near-optimal resilience under multiple robustness criteria, and reveals that no single rule dominates under two attack types (sneak and siege). To resolve this, it proposes a two-phase framework, 2PRASHB, that generates two candidate models and lets honest clients vote to select the winner, balancing security and performance. The approach yields provable resilience guarantees and substantial empirical gains on both homogeneous and heterogeneous data distributions across image-classification tasks. Overall, the work provides a principled, scalable defense for resilient distributed learning with practical impact on large-scale systems.
Abstract
Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur in large-scale distributed learning systems. The key to secure resilience against Byzantine machines in distributed learning is resilient aggregation mechanisms. Although abundant resilient aggregation rules have been proposed, they are designed in ad-hoc manners, imposing extra barriers on comparing, analyzing, and improving the rules across performance criteria. This paper studies near-optimal aggregation rules using clustering in the presence of outliers. Our outlier-robust clustering approach utilizes geometric properties of the update vectors provided by workers. Our analysis show that constant approximations to the 1-center and 1-mean clustering problems with outliers provide near-optimal resilient aggregators for metric-based criteria, which have been proven to be crucial in the homogeneous and heterogeneous cases respectively. In addition, we discuss two contradicting types of attacks under which no single aggregation rule is guaranteed to improve upon the naive average. Based on the discussion, we propose a two-phase resilient aggregation framework. We run experiments for image classification using a non-convex loss function. The proposed algorithms outperform previously known aggregation rules by a large margin with both homogeneous and heterogeneous data distributions among non-faulty workers. Code and appendix are available at https://github.com/jerry907/AAAI24-RASHB.
