Byzantine-tolerant distributed learning of finite mixture models

Qiong Zhang; Yan Shuo Tan; Jiahua Chen

Byzantine-tolerant distributed learning of finite mixture models

Qiong Zhang, Yan Shuo Tan, Jiahua Chen

TL;DR

This work addresses robust distributed learning for finite mixture models under label-switching and Byzantine failures. It introduces Distance Filtered Mixture Reduction (DFMR), a density-based, Byzantine-tolerant aggregation that first filters out corrupted local estimates using pairwise $L^2$ distances between local densities via a Centre Of Attention (COAT) step, then applies Mixture Reduction (MR) on the filtered set. The authors establish that MR achieves the optimal $O_P(N^{-1/2})$ rate (with $N=nm$) when $n\ge m$ and is asymptotically equivalent to the global MLE when $m = o(n)$, and they prove that DFMR attains a rate of $O_P(N^{-1/2} + \alpha \rho n^{-1/2})$ with inflation factor $\rho$, matching oracle performance under mild conditions. Empirical results on simulated and real data corroborate the robustness and efficiency of DFMR across diverse Byzantine attack types and settings, including high-dimensional, multi-component mixtures. These findings enable reliable, one-round, Byzantine-tolerant aggregation for distributed unsupervised learning of mixtures, with practical implications for scalable and secure federated-style analytics.

Abstract

Traditional statistical methods need to be updated to work with modern distributed data storage paradigms. A common approach is the split-and-conquer framework, which involves learning models on local machines and averaging their parameter estimates. However, this does not work for the important problem of learning finite mixture models, because subpopulation indices on each local machine may be arbitrarily permuted (the "label switching problem"). Zhang and Chen (2022) proposed Mixture Reduction (MR) to address this issue, but MR remains vulnerable to Byzantine failure, whereby a fraction of local machines may transmit arbitrarily erroneous information. This paper introduces Distance Filtered Mixture Reduction (DFMR), a Byzantine tolerant adaptation of MR that is both computationally efficient and statistically sound. DFMR leverages the densities of local estimates to construct a robust filtering mechanism. By analysing the pairwise L2 distances between local estimates, DFMR identifies and removes severely corrupted local estimates while retaining the majority of uncorrupted ones. We provide theoretical justification for DFMR, proving its optimal convergence rate and asymptotic equivalence to the global maximum likelihood estimate under standard assumptions. Numerical experiments on simulated and real-world data validate the effectiveness of DFMR in achieving robust and accurate aggregation in the presence of Byzantine failure.

Byzantine-tolerant distributed learning of finite mixture models

TL;DR

distances between local densities via a Centre Of Attention (COAT) step, then applies Mixture Reduction (MR) on the filtered set. The authors establish that MR achieves the optimal

rate (with

) when

and is asymptotically equivalent to the global MLE when

, and they prove that DFMR attains a rate of

with inflation factor

, matching oracle performance under mild conditions. Empirical results on simulated and real data corroborate the robustness and efficiency of DFMR across diverse Byzantine attack types and settings, including high-dimensional, multi-component mixtures. These findings enable reliable, one-round, Byzantine-tolerant aggregation for distributed unsupervised learning of mixtures, with practical implications for scalable and secure federated-style analytics.

Abstract

Paper Structure (49 sections, 20 theorems, 216 equations, 16 figures, 5 tables, 3 algorithms)

This paper contains 49 sections, 20 theorems, 216 equations, 16 figures, 5 tables, 3 algorithms.

Introduction
Related work
Revisiting failure-free SC learning of finite mixture models
The mixture reduction estimator
Calculating the mixture reduction estimator
Theoretical guarantees for mixture reduction estimator
Notation
Assumptions
Rate of convergence and asymptotic normality
Distance-filtered mixture reduction under Byzantine failure
Formal definition of Byzantine failure
$L^2$ distances between mixture densities
Distance-filtered mixture reduction
Theoretical guarantees
Numerical experiments
...and 34 more sections

Key Result

Lemma 3.1

Let $\{X_1,\ldots,X_n\}$ be $n$ IID observations from a finite mixing distribution $G^*$ of known order $K$. Under Assumptions assumption:parameter-space--assumption:smoothness, the MLE $\widehat{G} = \sum_{k=1}^{K} \widehat{w}_k \delta_{\widehat{\theta}_k}$ based on this sample is unique and satisf

Figures (16)

Figure 1: Three Byzantine failure machines out of $m = 10$ machines. Failure-free local estimates (blue dots) are clustered, while failure ones (red triangles) are scattered. The COAT (yellow star) and other $4$ local estimates that are closest to COAT are in the grey circle, and they are failure-free.
Figure 2: The $W_1$ values of the DFMR($\rho$) approach as a function of the inflation factor $\rho$ under varying failure types, failure rates, and numbers of local machines. The dotted lines represent the DFMR($\rho$) approach, the dashed lines indicate the performance of the Oracle approach, and the dash-dotted lines correspond to the performance of the COAT approach.
Figure 3: $W_1$ values of different methods as $N$ and $\alpha$ varies, with $m=100$ and MaxOmega$=0.3$.
Figure 4: $W_1$ values of different methods as $m$ varies, with $\alpha=0.1$, and MaxOmega$=0.3$. The local sample size $n=5000$, the total sample $N=nm$ increasing with $m$.
Figure 5: $W_1$ values of different methods as $m$ varies, with $\alpha=0.1$, and MaxOmega$=0.3$. The total sample size $N=500K$, the local sample $n=N/m$ decreasing with $m$.
...and 11 more figures

Theorems & Definitions (48)

Definition 3.1: Finite mixture model
Remark 3.1: The label switching problem and finite mixture model parameterisation
Remark 3.2
Remark 3.3
Remark 3.4
Lemma 3.1: Properties of local MLE
Theorem 3.2
Definition 4.1: Byzantine failure
Remark 4.1
Lemma 4.1: $L^2$ distance between mixture densities
...and 38 more

Byzantine-tolerant distributed learning of finite mixture models

TL;DR

Abstract

Byzantine-tolerant distributed learning of finite mixture models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (48)