Structural Properties, Cycloid Trajectories and Non-Asymptotic Guarantees of EM Algorithm for Mixed Linear Regression
Zhankun Luo, Abolfazl Hashemi
TL;DR
This work tackles EM for a symmetric two-component Mixed Linear Regression (2MLR) model with unknown mixing weights, deriving explicit population and finite-sample update rules across all SNR levels ($\eta=\|\theta^{\ast}\|/\sigma$). It reveals that in the noiseless limit the regression trajectory follows a cycloid with rolling radius $\|\theta^{\ast}\|/\pi$, and it quantifies deviations at high SNR; it further provides non-asymptotic convergence guarantees by connecting EM statistical accuracy to a sub-optimality angle and proving convergence from arbitrary initialization via Easy-EM then EM. The paper delivers explicit EM update expressions, shows fixed points and contraction, and develops a trajectory-based framework that unifies population and finite-sample analyses across SNR regimes. Empirical results corroborate the cycloid trajectories, linear and quadratic convergence phases, and finite-sample convergence from varied initializations, underscoring the practical impact for latent-variable regression tasks such as haplotype assembly and phase retrieval.
Abstract
This work investigates the structural properties, cycloid trajectories, and non-asymptotic convergence guarantees of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR) with unknown mixing weights and regression parameters. Recent studies have established global convergence for 2MLR with known balanced weights and super-linear convergence in noiseless and high signal-to-noise ratio (SNR) regimes. However, the theoretical behavior of EM in the fully unknown setting remains unclear, with its trajectory and convergence order not yet fully characterized. We derive explicit EM update expressions for 2MLR with unknown mixing weights and regression parameters across all SNR regimes and analyze their structural properties and cycloid trajectories. In the noiseless case, we prove that the trajectory of the regression parameters in EM iterations traces a cycloid by establishing a recurrence relation for the sub-optimality angle, while in high SNR regimes we quantify its discrepancy from the cycloid trajectory. The trajectory-based analysis reveals the order of convergence: linear when the EM estimate is nearly orthogonal to the ground truth, and quadratic when the angle between the estimate and ground truth is small at the population level. Our analysis establishes non-asymptotic guarantees by sharpening bounds on statistical errors between finite-sample and population EM updates, relating EM's statistical accuracy to the sub-optimality angle, and proving convergence with arbitrary initialization at the finite-sample level. This work provides a novel trajectory-based framework for analyzing EM in Mixed Linear Regression.
