Byzantine-tolerant distributed learning of finite mixture models
Qiong Zhang, Yan Shuo Tan, Jiahua Chen
TL;DR
This work addresses robust distributed learning for finite mixture models under label-switching and Byzantine failures. It introduces Distance Filtered Mixture Reduction (DFMR), a density-based, Byzantine-tolerant aggregation that first filters out corrupted local estimates using pairwise $L^2$ distances between local densities via a Centre Of Attention (COAT) step, then applies Mixture Reduction (MR) on the filtered set. The authors establish that MR achieves the optimal $O_P(N^{-1/2})$ rate (with $N=nm$) when $n\ge m$ and is asymptotically equivalent to the global MLE when $m = o(n)$, and they prove that DFMR attains a rate of $O_P(N^{-1/2} + \alpha \rho n^{-1/2})$ with inflation factor $\rho$, matching oracle performance under mild conditions. Empirical results on simulated and real data corroborate the robustness and efficiency of DFMR across diverse Byzantine attack types and settings, including high-dimensional, multi-component mixtures. These findings enable reliable, one-round, Byzantine-tolerant aggregation for distributed unsupervised learning of mixtures, with practical implications for scalable and secure federated-style analytics.
Abstract
Traditional statistical methods need to be updated to work with modern distributed data storage paradigms. A common approach is the split-and-conquer framework, which involves learning models on local machines and averaging their parameter estimates. However, this does not work for the important problem of learning finite mixture models, because subpopulation indices on each local machine may be arbitrarily permuted (the "label switching problem"). Zhang and Chen (2022) proposed Mixture Reduction (MR) to address this issue, but MR remains vulnerable to Byzantine failure, whereby a fraction of local machines may transmit arbitrarily erroneous information. This paper introduces Distance Filtered Mixture Reduction (DFMR), a Byzantine tolerant adaptation of MR that is both computationally efficient and statistically sound. DFMR leverages the densities of local estimates to construct a robust filtering mechanism. By analysing the pairwise L2 distances between local estimates, DFMR identifies and removes severely corrupted local estimates while retaining the majority of uncorrupted ones. We provide theoretical justification for DFMR, proving its optimal convergence rate and asymptotic equivalence to the global maximum likelihood estimate under standard assumptions. Numerical experiments on simulated and real-world data validate the effectiveness of DFMR in achieving robust and accurate aggregation in the presence of Byzantine failure.
