Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

Ye Tian; Haolei Weng; Yang Feng

Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

Ye Tian, Haolei Weng, Yang Feng

TL;DR

The paper addresses unsupervised federated learning for mixtures by introducing FedGrEM, a federated gradient EM algorithm that handles task heterogeneity and adversarial contamination with communication- and computation-efficient local updates and a central regularization that borrows strength across tasks. It develops a non-asymptotic theory for general mixture models, decomposing the estimation error into iterative convergence, aggregation, heterogeneity, and outlier costs, and shows that, under sufficient similarity and small contamination, FedGrEM can outperform local single-task EM estimators. The authors instantiate the theory on Gaussian Mixture Models and Mixtures of Regressions, deriving explicit rates such as \\max_{k,r}(|\\hat w^{(k)[T]}_r - w^{(k)*}_r| \\vee \\|\\hat{\\bm{\\theta}}^{(k)[T]}_r - \\m{\\theta}^{(k)*}_r\\|_2) = \\widetilde{\\mathcal{O}}(\\kappa_0^T + R^2 \\sqrt{d/(nK)} + R^2 \\sqrt{1/n} + \\min\{h, R^2 \\sqrt{d/n}\} + \\epsilon R^2 \\sqrt{d/n})$ (GMM) and analogous forms for MoRs, highlighting how heterogeneity and outliers influence convergence. Empirical results on simulations and real data validate the theory and demonstrate FedGrEM’s superiority over Local-EM, FedEM, FedGMM, TGMM, and pooled-EM/pooled-GrEM benchmarks, while also addressing practical aspects like label permutation alignment. Overall, the work provides a rigorous foundation for non-asymptotic, robust, and communication-efficient unsupervised federated learning in mixture-model settings with meaningful implications for privacy-preserving, multi-task inference.

Abstract

While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. Several federated EM algorithms have gained popularity in practice, however, their theoretical foundations are often lacking. In this paper, we first introduce a federated gradient EM algorithm (FedGrEM) designed for the unsupervised learning of mixture models, which supplements the existing federated EM algorithms by considering task heterogeneity and potential adversarial attacks. We present a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on specific statistical models to characterize the explicit estimation error of model parameters and mixture proportions. Our theory elucidates when and how FedGrEM outperforms local single-task learning with insights extending to existing federated EM algorithms. This bridges the gap between their practical success and theoretical understanding. Our numerical results validate our theory, and demonstrate FedGrEM's superiority over existing unsupervised federated learning benchmarks.

Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

TL;DR

Abstract

Paper Structure (35 sections, 19 theorems, 225 equations, 3 figures, 8 tables)

This paper contains 35 sections, 19 theorems, 225 equations, 3 figures, 8 tables.

Introduction
Federated Learning on Mixture of Distributions
Problem Setting
Related Works
A Federated Gradient EM: FedGrEM
Our Contributions
Theory
Generic Analysis
Proof Sketch of Theorem \ref{['thm: generic']}
Example 1: Gaussian Mixture Models (GMMs)
Example 2: Mixture of Regressions (MoRs)
Numerical Results
Simulations
Real-data Studies
Discussions
...and 20 more sections

Key Result

Theorem 3.6

Suppose Assumptions asmp: q, asmp: w and theta, and asmp: thm generic hold. Then for any contaminated set $S^c$ with $\epsilon=|S^c|/K < 1/3$ and any contamination distribution $\mathbb{Q}_{S^c}$, w.p. $1- {\mathcal{O}}(1)$, for all $T \geq 1$, FedGrEM satisfies where $\kappa_0 \in (0, 1)$.

Figures (3)

Figure 1: An illustration of Algorithm \ref{['algo: FG-EM']} (the iteration round $t$).
Figure 2: Schematic of the geometric convergence and the localization trick, where we shrink the radius of uniform convergence ball from $r_1^*$ to $r_0^*$ after the first iteration.
Figure 3: The average estimation errors of different methods in 100 replications of the GMM and MoR simulations (in $\log_e$ scale). The left two figures show the estimation error $\max_{k \in S}\max_{r \in [R]}\log(|\widehat{w}^{(k)[T]}_r - w^{(k)*}_r|)$ and the right two figures show the estimation error $\max_{k \in S}\max_{r \in [R]}\log(\|\widehat{\bm{\theta}}^{(k)[T]}_r - \bm{\theta}^{(k)*}_r\|_{2})$. $x$-axis represents the ratio between model heterogeneity $h$ and SNR (signal-to-noise ratio), where the definition of SNR is in Section \ref{['subsec: additional simulations']} of the appendix.

Theorems & Definitions (27)

Remark 3.2
Remark 3.4
Theorem 3.6: Main result, a simplified version of Theorem \ref{['thm: generic appendix']}
Proposition 3.8
Corollary 3.9
Proposition 3.11
Corollary 3.12
Remark 1.2
Remark 1.4
Remark 1.6
...and 17 more

Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

TL;DR

Abstract

Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (27)