Table of Contents
Fetching ...

Structural Properties, Cycloid Trajectories and Non-Asymptotic Guarantees of EM Algorithm for Mixed Linear Regression

Zhankun Luo, Abolfazl Hashemi

TL;DR

This work tackles EM for a symmetric two-component Mixed Linear Regression (2MLR) model with unknown mixing weights, deriving explicit population and finite-sample update rules across all SNR levels ($\eta=\|\theta^{\ast}\|/\sigma$). It reveals that in the noiseless limit the regression trajectory follows a cycloid with rolling radius $\|\theta^{\ast}\|/\pi$, and it quantifies deviations at high SNR; it further provides non-asymptotic convergence guarantees by connecting EM statistical accuracy to a sub-optimality angle and proving convergence from arbitrary initialization via Easy-EM then EM. The paper delivers explicit EM update expressions, shows fixed points and contraction, and develops a trajectory-based framework that unifies population and finite-sample analyses across SNR regimes. Empirical results corroborate the cycloid trajectories, linear and quadratic convergence phases, and finite-sample convergence from varied initializations, underscoring the practical impact for latent-variable regression tasks such as haplotype assembly and phase retrieval.

Abstract

This work investigates the structural properties, cycloid trajectories, and non-asymptotic convergence guarantees of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR) with unknown mixing weights and regression parameters. Recent studies have established global convergence for 2MLR with known balanced weights and super-linear convergence in noiseless and high signal-to-noise ratio (SNR) regimes. However, the theoretical behavior of EM in the fully unknown setting remains unclear, with its trajectory and convergence order not yet fully characterized. We derive explicit EM update expressions for 2MLR with unknown mixing weights and regression parameters across all SNR regimes and analyze their structural properties and cycloid trajectories. In the noiseless case, we prove that the trajectory of the regression parameters in EM iterations traces a cycloid by establishing a recurrence relation for the sub-optimality angle, while in high SNR regimes we quantify its discrepancy from the cycloid trajectory. The trajectory-based analysis reveals the order of convergence: linear when the EM estimate is nearly orthogonal to the ground truth, and quadratic when the angle between the estimate and ground truth is small at the population level. Our analysis establishes non-asymptotic guarantees by sharpening bounds on statistical errors between finite-sample and population EM updates, relating EM's statistical accuracy to the sub-optimality angle, and proving convergence with arbitrary initialization at the finite-sample level. This work provides a novel trajectory-based framework for analyzing EM in Mixed Linear Regression.

Structural Properties, Cycloid Trajectories and Non-Asymptotic Guarantees of EM Algorithm for Mixed Linear Regression

TL;DR

This work tackles EM for a symmetric two-component Mixed Linear Regression (2MLR) model with unknown mixing weights, deriving explicit population and finite-sample update rules across all SNR levels (). It reveals that in the noiseless limit the regression trajectory follows a cycloid with rolling radius , and it quantifies deviations at high SNR; it further provides non-asymptotic convergence guarantees by connecting EM statistical accuracy to a sub-optimality angle and proving convergence from arbitrary initialization via Easy-EM then EM. The paper delivers explicit EM update expressions, shows fixed points and contraction, and develops a trajectory-based framework that unifies population and finite-sample analyses across SNR regimes. Empirical results corroborate the cycloid trajectories, linear and quadratic convergence phases, and finite-sample convergence from varied initializations, underscoring the practical impact for latent-variable regression tasks such as haplotype assembly and phase retrieval.

Abstract

This work investigates the structural properties, cycloid trajectories, and non-asymptotic convergence guarantees of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR) with unknown mixing weights and regression parameters. Recent studies have established global convergence for 2MLR with known balanced weights and super-linear convergence in noiseless and high signal-to-noise ratio (SNR) regimes. However, the theoretical behavior of EM in the fully unknown setting remains unclear, with its trajectory and convergence order not yet fully characterized. We derive explicit EM update expressions for 2MLR with unknown mixing weights and regression parameters across all SNR regimes and analyze their structural properties and cycloid trajectories. In the noiseless case, we prove that the trajectory of the regression parameters in EM iterations traces a cycloid by establishing a recurrence relation for the sub-optimality angle, while in high SNR regimes we quantify its discrepancy from the cycloid trajectory. The trajectory-based analysis reveals the order of convergence: linear when the EM estimate is nearly orthogonal to the ground truth, and quadratic when the angle between the estimate and ground truth is small at the population level. Our analysis establishes non-asymptotic guarantees by sharpening bounds on statistical errors between finite-sample and population EM updates, relating EM's statistical accuracy to the sub-optimality angle, and proving convergence with arbitrary initialization at the finite-sample level. This work provides a novel trajectory-based framework for analyzing EM in Mixed Linear Regression.

Paper Structure

This paper contains 37 sections, 61 theorems, 273 equations, 3 figures.

Key Result

Proposition 1

Let $f(\theta, \pi):=-\mathbb{E}_{s\sim p(s\mid \theta^\ast, \pi^\ast)}[\ln p(s\mid \theta, \pi)]$ be the negative log-likelihood function at the population level, and $f_n(\theta, \pi):=-\frac{1}{n}\sum_{i=1}^n \ln p(s_i\mid \theta, \pi)$ be the negative log-likelihood function at the finite-sample where $\mathbb{E}[\cdot]=\mathbb{E}_{s\sim p(s\mid \theta^\ast, \pi^\ast)}[\cdot]$ is the expectati

Figures (3)

  • Figure 1: The cycloid trajectory for the EM update rule $M(\theta, \nu)$ of regression parameters in the noiseless setting (SNR $\eta \to \infty$), and the fixed points of the population EM update rules are shown in the figure: these two red points stand for the ground truth parameters $\theta^\ast$ and $-\theta^\ast$, the blue point stands for the unstable fixed point $\vec{0}$ as distinct fixed points of the EM update rules in Proposition \ref{['prop:distinct_fixed_points']}, and the green points stand for the two saddle points $\pm \lim_{\eta\to\infty}k^\ast(\eta) \| \theta^\ast\| \hat{e}_2 = \pm \frac{2}{\pi} \| \theta^\ast\| \hat{e}_2$ as fixed points of the EM update rules on the plane $\text{span}\{\theta, \theta^\ast\}$ in Proposition \ref{['prop:contraction_property']} with contraction property along the direction of $\pm\hat{e}_2$ orthogonal to the ground truth $\theta^\ast$. (a) Sub-optimality angle $\varphi$: the angle between the unit direction vector $\hat{e}_2$ and the regression parameters $\theta$; $\varphi^{t}, \varphi^{t-1}$ correspond to the sub-optimality angles at the $t$-th and $(t-1)$-th EM iterations, where the regression parameters take the values $\theta^t$ and $\theta^{t-1}$, respectively. (b) Sub-optimality angle $\phi$: twice the minimum angle between $\theta$ and $\pm\theta^\ast$, i.e., $\phi = 2\arccos |\langle \theta, \theta^\ast \rangle|/(\|\theta\|\|\theta^\ast\|)$; in the noiseless setting, $\theta^{t}$ follows a cycloid trajectory with rolling radius $\|\theta^\ast\|/\pi$, where the rolling angle is determined by the previous sub-optimality angle $\phi^{t-1}$ (Proposition \ref{['prop:parametric_cycloid']}).
  • Figure 2: Cycloid trajectories of EM iterations for regression parameters $\theta^t$: we run 100 iterations of Finite-sample EM at SNR=$10^8$ for varying dimensions ($d=2,3,50$). (a) $d=2$, trajectories of $\theta^t$ across 60 trials with $\theta^\ast=(1, 0)$, $\pi^\ast=(0.7, 0.3)$; inital values $\theta^0$ and $\pi^0$ are uniformly sampled from $[-2, 2]^2$ and $[0, 1]$, respectively. (b) $d=3$, trajectories of $\theta^t$ across 10 trials, where $\theta^\ast, \theta^0$ are sampled from three-dimensional unit sphere, and $\pi^\ast, \pi^0$ are drawn uniformly from $[0, 1]$. (c) $d=50$, trajectories of $\theta^t$ across 60 trials, with $\theta^\ast, \theta^0$ sampled from $\mathcal{N}(0, I_d)$, and $\pi^\ast, \pi^0$ uniformly drawn from $[0, 1]$.
  • Figure 3: Left and middle panels illustrate the quadratic convergence of the sub-optimality angle $\phi^t$ and its correlation with the mixing-weight error. Both use $\theta^\ast$ and $\theta^0$ sampled from the $d=50$ unit sphere, with $\phi^0 = 1.4$ (equivalently, $\varphi^0 = (\pi - 1.4)/2$) in (a) and $\varphi^0 = 0.3$ in (b). The right panel (c) shows the accuracy of the EM estimates for the regression parameters and mixing weights over ten EM iterations with $d=50$, $\varphi^0 = 0.3$ at $\mathrm{SNR} = 10^6$, and varying true mixing weights $\pi^\ast =(0.5, 0.5), (0.6, 0.4), (0.8, 0.2)$,and $(1 - 10^{-6}, 10^{-6})$. (a) Quadratic convergence of the sub-optimality angle $\phi^t$ with all EM iterations starting with $\phi^0 = 1.4$ and $\pi^\ast(1), \pi^0(1)$ drawn uniformly from $[0, 1]$. (b) Correlation between the mixing-weight error $|\pi^t - \bar{\pi}^\ast|_1$ and the preceding sub-optimality angle $\phi^{t-1} = \arccos\left|\langle \theta^{t-1}, \theta^\ast \rangle/(\|\theta^{t-1}\|\|\theta^\ast\|)\right|$. (c) Accuracy of the EM estimates for the regression parameters and mixing weights versus iteration for different ground-truth mixing weights.

Theorems & Definitions (122)

  • Proposition 1: Population and Finite-Sample Negative Log-Likelihood
  • remark
  • Lemma 2: EM Update Rules and Gradients
  • remark
  • Proposition 3: Connection between EM Update Rules and Gradient Descent
  • remark
  • Theorem 4: Explict EM Update Expressions
  • remark
  • Proposition 5: Boundedness of EM Update Rule
  • remark
  • ...and 112 more