Table of Contents
Fetching ...

Network EM Algorithm for Gaussian Mixture Model in Decentralized Federated Learning

Shuyuan Wu, Bin Du, Xuetong Li, Hansheng Wang

TL;DR

Rigorous theoretical analysis demonstrates that MNEM can achieve statistical efficiency comparable to that of the whole sample estimator when the mixture components satisfy certain separation conditions, even in heterogeneous scenarios, even in heterogeneous scenarios.

Abstract

We systematically study various network Expectation-Maximization (EM) algorithms for the Gaussian mixture model within the framework of decentralized federated learning. Our theoretical investigation reveals that directly extending the classical decentralized supervised learning method to the EM algorithm exhibits poor estimation accuracy with heterogeneous data across clients and struggles to converge numerically when Gaussian components are poorly-separated. To address these issues, we propose two novel solutions. First, to handle heterogeneous data, we introduce a momentum network EM (MNEM) algorithm, which uses a momentum parameter to combine information from both the current and historical estimators. Second, to tackle the challenge of poorly-separated Gaussian components, we develop a semi-supervised MNEM (semi-MNEM) algorithm, which leverages partially labeled data. Rigorous theoretical analysis demonstrates that MNEM can achieve statistical efficiency comparable to that of the whole sample estimator when the mixture components satisfy certain separation conditions, even in heterogeneous scenarios. Moreover, the semi-MNEM estimator enhances the convergence speed of the MNEM algorithm, effectively addressing the numerical convergence challenges in poorly-separated scenarios. Extensive simulation and real data analyses are conducted to justify our theoretical findings.

Network EM Algorithm for Gaussian Mixture Model in Decentralized Federated Learning

TL;DR

Rigorous theoretical analysis demonstrates that MNEM can achieve statistical efficiency comparable to that of the whole sample estimator when the mixture components satisfy certain separation conditions, even in heterogeneous scenarios, even in heterogeneous scenarios.

Abstract

We systematically study various network Expectation-Maximization (EM) algorithms for the Gaussian mixture model within the framework of decentralized federated learning. Our theoretical investigation reveals that directly extending the classical decentralized supervised learning method to the EM algorithm exhibits poor estimation accuracy with heterogeneous data across clients and struggles to converge numerically when Gaussian components are poorly-separated. To address these issues, we propose two novel solutions. First, to handle heterogeneous data, we introduce a momentum network EM (MNEM) algorithm, which uses a momentum parameter to combine information from both the current and historical estimators. Second, to tackle the challenge of poorly-separated Gaussian components, we develop a semi-supervised MNEM (semi-MNEM) algorithm, which leverages partially labeled data. Rigorous theoretical analysis demonstrates that MNEM can achieve statistical efficiency comparable to that of the whole sample estimator when the mixture components satisfy certain separation conditions, even in heterogeneous scenarios. Moreover, the semi-MNEM estimator enhances the convergence speed of the MNEM algorithm, effectively addressing the numerical convergence challenges in poorly-separated scenarios. Extensive simulation and real data analyses are conducted to justify our theoretical findings.

Paper Structure

This paper contains 4 theorems, 15 equations, 4 figures, 2 algorithms.

Key Result

Theorem 1

Assume that $X_1,\dots,X_N$ are identically and independently generated from a GMM model eq:gmm with parameter $\theta_0$. Then we have: (i) $\|\dot{F}(\theta_0)\| \leq C_0 \max_k \sqrt{\tau_k}$ with probability at least $1 - O(N^{-\alpha})$ for any $\alpha \geq 1$ and some positive constant $C_0

Figures (4)

  • Figure 1: The log(MSE) values of MNEM and EM ($r=0$), as well as semi-MNEM and semi-EM ($r>0$) for the circle-type network structure. Each $r$ value corresponds to two lines, where the curve represents the proposed method MNEM or semi-MNEM and the line represents the whole sample estimator EM or semi-EM. The upper and lower panels represent the homogeneous and heterogeneous data generating process respectively. The left, middle, and right panels represent $C=1,2,4$ respectively.
  • Figure 2: The log(MSE) values of MNEM, DEM, and NGD for different distribution patterns and separabilities of Gaussian components. The blue dotted line represents the whole sample EM estimators. Here, we fix the network structure to be a circle-type structure and $\eta=0.01$ for MNEM and DEM.
  • Figure 3: The log(MSE) values of semi-MNEM, semi-DEM, and semi-NGD for different distribution patterns and separabilities of Gaussian components. The blue dotted line represents the whole sample semi-EM estimators. Here, we fix the network structure to be a circle-type structure, the labeled ratio $r=0.1$, and $\eta=0.01$ for MNEM and DEM.
  • Figure 4: The mean and log(SD) for the Err values obtained by different estimators. The left panel represents the original GMM model, while the right panel shows the results for the counterparts of NGD, DEM, and MNEM with partial labeling. Here, we fix the network structure to be a circle-type structure with a proportion of 10% instances labeled. Moreover, the red dotted line in the upper panel represents the optimal Err value, which is calculated based on the entire dataset using a single computer.

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4