Table of Contents
Fetching ...

Mixed Regression via Approximate Message Passing

Nelvin Tan, Ramji Venkataramanan

TL;DR

This work presents a matrix GLM framework for regression with multiple latent signals and latent variables, unifying mixed linear regression, max-affine regression, and mixture-of-experts. An AMP algorithm tailored to matrix GLMs is developed, with a rigorous state-evolution analysis that predicts high-dimensional performance and guides optimal denoising function design. The authors derive Bayes-optimal denoisers and introduce an EM-AMP variant for MAR to estimate intercepts alongside signals, achieving substantial empirical gains over standard estimators across several regimes. Numerical experiments on MLR, MAR, and MOE validate the theory and demonstrate that AMP can significantly outperform existing methods in many parameter settings.

Abstract

We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.

Mixed Regression via Approximate Message Passing

TL;DR

This work presents a matrix GLM framework for regression with multiple latent signals and latent variables, unifying mixed linear regression, max-affine regression, and mixture-of-experts. An AMP algorithm tailored to matrix GLMs is developed, with a rigorous state-evolution analysis that predicts high-dimensional performance and guides optimal denoising function design. The authors derive Bayes-optimal denoisers and introduce an EM-AMP variant for MAR to estimate intercepts alongside signals, achieving substantial empirical gains over standard estimators across several regimes. Numerical experiments on MLR, MAR, and MOE validate the theory and demonstrate that AMP can significantly outperform existing methods in many parameter settings.

Abstract

We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.
Paper Structure (41 sections, 5 theorems, 111 equations, 16 figures, 1 algorithm)

This paper contains 41 sections, 5 theorems, 111 equations, 16 figures, 1 algorithm.

Key Result

Theorem 1

Consider the AMP in eq:GAMP for the matrix GLM model in eq:matrix-GLM. Suppose that the model assumptions in Section subsec:prelim_model as well as (A1) and (A2) are satisfied, and that $\mathrm{T}^{1}_{B}$ is positive definite. Then for each $k \ge 0$, we have as $n,p\rightarrow\infty$ with $n/p\rightarrow\delta$, where $\Theta_i=B^\top X_i$ for $1\leq i\leq n$. In the above, $G^{k+1}_B\sim \mat

Figures (16)

  • Figure 1: MLR, Gaussian prior with $\rho=0$: normalized squared correlation vs. $\delta$ for various noise levels $\sigma$, with $\alpha=0.7$.
  • Figure 2: MLR, Gaussian prior with different values of signal covariance $\rho$: Normalized squared correlation vs. $\delta$, with $\alpha=0.7$, $\sigma=0$.
  • Figure 3: MLR, Gaussian prior with $\rho=0$ and different values of estimated proportion $\hat{\alpha}$: Normalized squared correlation vs. $\delta$, with true $\alpha=0.7$, $\sigma=0$.
  • Figure 4: MLR: Heatmap of minimum normalized correltion for Bayes-optimal $f_k$, with $p=500$, $\sigma =0$.
  • Figure 5: MLR: Heatmap of minimum normalized correlation for soft-thresholding $f_k$, with $p=1000$, $\sigma=0$. Soft-thresholding tuning parameter $\zeta =1.1402$
  • ...and 11 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Proposition 2
  • Proposition 3
  • Lemma 4
  • Lemma 5