Table of Contents
Fetching ...

Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression

Zhankun Luo, Abolfazl Hashemi

Abstract

Mixture models have attracted significant attention due to practical effectiveness and comprehensive theoretical foundations. A persisting challenge is model misspecification, which occurs when the model to be fitted has more mixture components than those in the data distribution. In this paper, we develop a theoretical understanding of the Expectation-Maximization (EM) algorithm's behavior in the context of targeted model misspecification for overspecified two-component Mixed Linear Regression (2MLR) with unknown $d$-dimensional regression parameters and mixing weights. In Theorem 5.1 at the population level, with an unbalanced initial guess for mixing weights, we establish linear convergence of regression parameters in $O(\log(1/ε))$ steps. Conversely, with a balanced initial guess for mixing weights, we observe sublinear convergence in $O(ε^{-2})$ steps to achieve the $ε$-accuracy at Euclidean distance. In Theorem 6.1 at the finite-sample level, for mixtures with sufficiently unbalanced fixed mixing weights, we demonstrate a statistical accuracy of $O((d/n)^{1/2})$, whereas for those with sufficiently balanced fixed mixing weights, the accuracy is $O((d/n)^{1/4})$ given $n$ data samples. Furthermore, we underscore the connection between our population level and finite-sample level results: by setting the desired final accuracy $ε$ in Theorem 5.1 to match that in Theorem 6.1 at the finite-sample level, namely letting $ε= O((d/n)^{1/2})$ for sufficiently unbalanced fixed mixing weights and $ε= O((d/n)^{1/4})$ for sufficiently balanced fixed mixing weights, we intuitively derive iteration complexity bounds $O(\log (1/ε))=O(\log (n/d))$ and $O(ε^{-2})=O((n/d)^{1/2})$ at the finite-sample level for sufficiently unbalanced and balanced initial mixing weights. We further extend our analysis in overspecified setting to low SNR regime.

Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression

Abstract

Mixture models have attracted significant attention due to practical effectiveness and comprehensive theoretical foundations. A persisting challenge is model misspecification, which occurs when the model to be fitted has more mixture components than those in the data distribution. In this paper, we develop a theoretical understanding of the Expectation-Maximization (EM) algorithm's behavior in the context of targeted model misspecification for overspecified two-component Mixed Linear Regression (2MLR) with unknown -dimensional regression parameters and mixing weights. In Theorem 5.1 at the population level, with an unbalanced initial guess for mixing weights, we establish linear convergence of regression parameters in steps. Conversely, with a balanced initial guess for mixing weights, we observe sublinear convergence in steps to achieve the -accuracy at Euclidean distance. In Theorem 6.1 at the finite-sample level, for mixtures with sufficiently unbalanced fixed mixing weights, we demonstrate a statistical accuracy of , whereas for those with sufficiently balanced fixed mixing weights, the accuracy is given data samples. Furthermore, we underscore the connection between our population level and finite-sample level results: by setting the desired final accuracy in Theorem 5.1 to match that in Theorem 6.1 at the finite-sample level, namely letting for sufficiently unbalanced fixed mixing weights and for sufficiently balanced fixed mixing weights, we intuitively derive iteration complexity bounds and at the finite-sample level for sufficiently unbalanced and balanced initial mixing weights. We further extend our analysis in overspecified setting to low SNR regime.

Paper Structure

This paper contains 52 sections, 52 theorems, 283 equations, 5 figures.

Key Result

proposition 4

(Approximate Dynamic Equations) Let $\alpha^t := \| \theta^t \|/\sigma=\|M(\theta^{t-1},\nu^{t-1})\|/\sigma$ and $\beta^t := \tanh (\nu^t) = N(\theta^{t-1}, \nu^{t-1})$ for all $t\in\mathbb{Z}_+$ be the $t$-th iteration of the EM update rules $\| M (\theta, \nu) \| / \sigma$ and $N (\theta, \nu)$ at

Figures (5)

  • Figure 1: Left: EM trajectories are nearly perfect rays from the origin to the initial point, which aligns with the theoretical results in Identity \ref{['prop:em']}. Right: In the worst case, we show that $\alpha^t \geq 0.1$ for all $t \leq 9$ (see remark on the proof of Fact \ref{['fact:init_popl']} in Appendix \ref{['sup:popl']}, Subsection \ref{['supsub:init_popl']}) by using the theoretical matching lower bound for the worst case in Proposition \ref{['prop:sublinear_convg']}. Also, we demonstrate that $\alpha^t < 0.1$ for all $t \geq 36$ (Fact \ref{['fact:init_popl']}) by applying the theoretical upper bound in Proposition \ref{['prop:sublinear_convg']}. As $\alpha^{20} \approx 0.1$ by numerical evaluations, and $20 > 9$ and $20 < 36$, the theoretical results are consistent with the numerical results shown in the figure.
  • Figure 2: Top: Relation of the $(\alpha^t-\alpha^{t+1})/\alpha^t$ and $[\beta^t]^2$ given $\alpha^t = 0.1$ and $\beta^t \in\{0.1, 0.2, \cdots, 0.9, 1\}$. Bottom: Relation of the $(\beta^t-\beta^{t+1})/\beta^t$ and $\alpha^t \alpha^{t+1}$ given $\alpha^t = 0.1$ and $\beta^t \in\{0.1, 0.2, \cdots, 0.9, 1\}$. The difference between $(\alpha^t-\alpha^{t+1})/\alpha^t$ and $[\beta^t]^2$ is bounded by $(1-[\beta^t]^2)\mathcal{O}([\alpha^t]^2)$, and the difference between $(\beta^t-\beta^{t+1})/\beta^t$ and $\alpha^t \alpha^{t+1}$ is bounded by $(1-[\beta^t]^2)\mathcal{O}([\alpha^t]^4)$ (see remark on the proof of Proposition \ref{['prop:dynamic']} in Appendix \ref{['sup:em']}, Subsection \ref{['supsub:em_dynamic']}).
  • Figure 3: Comparison of convergence behavior and EM trajectories at population level.
  • Figure 4: Sublinear convergence rate bounds for $\alpha^t:=\|\theta^t\|/\sigma$ as shown in Proposition \ref{['prop:sublinear_convg']}, with different initial values $\alpha^0=0.02$, $0.05$, $0.1$ and balanced initial guess $\beta^0:=\|\pi^0-\frac{\mathds{1}}{2}\|_1=0$ over 200 EM iterations.
  • Figure 5: Theorem \ref{['thm:finite']} provides bounds on time complexity and statistical accuracy for fixed mixing weights $\pi^t = \pi^0$ in both sufficiently unbalanced and balanced cases.

Theorems & Definitions (118)

  • remark
  • remark
  • remark
  • proposition 4
  • remark
  • remark
  • remark
  • proposition 7
  • remark
  • proposition 8
  • ...and 108 more