Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

Avishek Ghosh; Arya Mazumdar

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

Avishek Ghosh, Arya Mazumdar

TL;DR

This work establishes provable guarantees for agnostic learning of mixtures of $k$ linear regressions using EM and AM algorithms without assuming a generative data model. By analyzing both min-loss (AM) and soft-min loss (EM), the authors prove exponential convergence to population loss minimizers under suitable initialization and data-geometric conditions, with dimension-free initialization and explicit contraction rates. They introduce fresh-sample per-iteration techniques to manage inter-iteration dependencies and provide generalization bounds via Rademacher complexity for the resulting loss classes. The results improve upon prior agnostic analyses by enabling dimension-independent initialization and by deriving sharper contraction guarantees under Gaussian covariates, along with detailed generalization and sample-complexity assessments. Overall, the paper demonstrates that AM and EM can achieve robust, optimal learning in agnostic settings for mixed linear regressions.

Abstract

Mixed linear regression is a well-studied problem in parametric statistics and machine learning. Given a set of samples, tuples of covariates and labels, the task of mixed linear regression is to find a small list of linear relationships that best fit the samples. Usually it is assumed that the label is generated stochastically by randomly selecting one of two or more linear functions, applying this chosen function to the covariates, and potentially introducing noise to the result. In that situation, the objective is to estimate the ground-truth linear functions up to some parameter error. The popular expectation maximization (EM) and alternating minimization (AM) algorithms have been previously analyzed for this. In this paper, we consider the more general problem of agnostic learning of mixed linear regression from samples, without such generative models. In particular, we show that the AM and EM algorithms, under standard conditions of separability and good initialization, lead to agnostic learning in mixed linear regression by converging to the population loss minimizers, for suitably defined loss functions. In some sense, this shows the strength of AM and EM algorithms that converges to ``optimal solutions'' even in the absence of realizable generative models.

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

TL;DR

This work establishes provable guarantees for agnostic learning of mixtures of

linear regressions using EM and AM algorithms without assuming a generative data model. By analyzing both min-loss (AM) and soft-min loss (EM), the authors prove exponential convergence to population loss minimizers under suitable initialization and data-geometric conditions, with dimension-free initialization and explicit contraction rates. They introduce fresh-sample per-iteration techniques to manage inter-iteration dependencies and provide generalization bounds via Rademacher complexity for the resulting loss classes. The results improve upon prior agnostic analyses by enabling dimension-independent initialization and by deriving sharper contraction guarantees under Gaussian covariates, along with detailed generalization and sample-complexity assessments. Overall, the paper demonstrates that AM and EM can achieve robust, optimal learning in agnostic settings for mixed linear regressions.

Abstract

Paper Structure (24 sections, 7 theorems, 87 equations, 2 algorithms)

This paper contains 24 sections, 7 theorems, 87 equations, 2 algorithms.

Introduction
Setup and Geometric Parameters
Summary of Contributions
Related works
Organization
Notation
Agnostic Mixed Linear Regression-Min-Loss
Gradient AM Algorithm
Theoretical Guarantees
EM algorithm for Soft-Min Loss
Gradient EM Algorithm
Theoretical Guarantees
Proof Sketches
Gradient AM (Theorem \ref{['thm:am']})
Gradient EM (Theorem \ref{['thm:em']})
...and 9 more sections

Key Result

Theorem 2.1

Suppose $x_i \stackrel{i.i.d} {\sim} \mathcal{N}(0,I_d)$ and that $n' \geq C\frac{d \log(1/\pi_{\min})}{\pi_{\min}^3}$. Furthermore, for all $j \in [k]$ where $c_{\mathsf{ini}}$ is a small positive constant (initialization parameter). Moreover, let the separation parameter satisfy Then, running one iteration of Gradient AM with step size $\gamma$, yields $\{\theta^{+}_j\}_{j=1}^k$ satisfying $1

Theorems & Definitions (19)

Theorem 2.1: Gradient AM
Remark 2.2: Contraction factor $\rho$
Remark 2.3: Error floor $\varepsilon$
Remark 2.4: Re-sampling
Remark 2.5: Probability of error $P_e$
Remark 2.6: Sample complexity
Theorem 3.1: Gradient EM
Remark 3.2: Error floor $\varepsilon$
Definition 5.1
Claim 5.2
...and 9 more

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

TL;DR

Abstract

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)