Table of Contents
Fetching ...

Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression

Zhankun Luo, Abolfazl Hashemi

TL;DR

This work analyzes EM convergence for the two-component mixed linear regression (2MLR) with unlabeled data. It derives explicit population EM updates across all SNR using Bessel functions, and reveals that in the noiseless setting the EM iterates follow a cycloid trajectory within the span of the initialization and the truth, enabling a precise recurrence for the sub-optimality angle. The authors prove a transition from linear to quadratic convergence and establish finite-sample error bounds for regression parameters and mixing weights, with a three-stage convergence scheme and minimal dependence on mixing weights. Empirical results validate the cycloid trajectory, show robust quadratic convergence at high SNR, and demonstrate that the regression update is largely independent of the true mixing weights, supporting the practicality of the theory and suggesting avenues for extensions to weak separation and more components.

Abstract

We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression.

Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression

TL;DR

This work analyzes EM convergence for the two-component mixed linear regression (2MLR) with unlabeled data. It derives explicit population EM updates across all SNR using Bessel functions, and reveals that in the noiseless setting the EM iterates follow a cycloid trajectory within the span of the initialization and the truth, enabling a precise recurrence for the sub-optimality angle. The authors prove a transition from linear to quadratic convergence and establish finite-sample error bounds for regression parameters and mixing weights, with a three-stage convergence scheme and minimal dependence on mixing weights. Empirical results validate the cycloid trajectory, show robust quadratic convergence at high SNR, and demonstrate that the regression update is largely independent of the true mixing weights, supporting the practicality of the theory and suggesting avenues for extensions to weak separation and more components.

Abstract

We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression.
Paper Structure (21 sections, 51 theorems, 221 equations, 4 figures)

This paper contains 21 sections, 51 theorems, 221 equations, 4 figures.

Key Result

theorem 5

(EM Updates across All SNR) Let $\rho := \frac{\langle \theta, \theta^{\ast} \rangle}{\| \theta \| \cdot \| \theta^{\ast} \|}, \bar{\theta} := \frac{\theta}{\sigma}, \bar{\theta}^{\ast} := \frac{\theta^{\ast}}{\sigma}$, then the EM update rules for $\theta, \tanh(\nu)$ at Population level are where these coefficients are defined as

Figures (4)

  • Figure 1: The EM update $M(\theta, \nu)$ for regression parameters lies on span$\{\theta, \theta^\ast\}$.
  • Figure 2: The cycloid trajectory for the EM update $M(\theta, \nu)$ of regression parameters $\theta$. The figure further shows the two global solutions (red dots), the unstable solution (blue dot), and the two saddle points (green dots). As long as the initial suboptimality angle is sufficiently large, $\varphi^t$ and in turn $\theta^t$ super-linearly converge to $\frac{\pi}{2}$ and $\theta^\ast$.
  • Figure 3: Cycloid trajectory of EM iterations $\theta^t$-- we perform 100 iterations of Finite-sample EM with SNR=$10^8$, varying dimensions ($d=2,3,50$).
  • Figure 4: Left and Middle: Quadratic convergence and correlation are shown with $\theta^\ast, \theta^0$ from $d=50$ unit sphere, s.t. $\varphi^0 = \arctan(1.5)$ in Panel (a), $\varphi^0 = 0.3$ in Panel (b). Right: The errors of regression parameters and mixing weights for ten EM iterations, with $d=50, \varphi^0=0.3$, SNR=$10^8$ and different true mixing weights $\pi^\ast=\{0.6, 0.4\},\{0.8, 0.2\}, \{1, 0\}$.

Theorems & Definitions (88)

  • theorem 5
  • corollary 6
  • corollary 7
  • theorem 8
  • corollary 9
  • proposition 10
  • proposition 11
  • proposition 12
  • theorem 13
  • proposition 14
  • ...and 78 more