Table of Contents
Fetching ...

Max-Linear Regression by Convex Programming

Seonho Kim, Sohail Bahmani, Kiryung Lee

TL;DR

This work presents Anchored Regression (AR), a scalable convex program for estimating the parameters of a multivariate max-linear regression model under Gaussian covariates and adversarial noise. AR convexifies the typically nonconvex LAD objective by using an anchor vector, enabling nonasymptotic recovery guarantees with sample complexity that scales as $n \ge C \zeta^{-2}(4p\log^3 p\log^5 k+4\log(1/\delta)\log k)$, where $\zeta$ captures the cone-geometry of the problem. In the balanced setting where the $k$ components are equally likely to be the maximum, this reduces to $n \asymp k^4 p$ up to logarithmic factors, matching known results for AM in certain regimes under noiseless conditions. The paper provides empirical evidence of AR’s robustness to outliers and deterministic noise, demonstrates competitive performance against AM in Gaussian settings, and introduces iterative AR (IAR) to further improve estimation accuracy. Theoretical development is complemented by a detailed comparison of computational costs and a tightness result for the fundamental bound, highlighting AR’s practical viability for large-scale max-linear regression problems.

Abstract

We consider the multivariate max-linear regression problem where the model parameters $\boldsymbolβ_{1},\dotsc,\boldsymbolβ_{k}\in\mathbb{R}^{p}$ need to be estimated from $n$ independent samples of the (noisy) observations $y = \max_{1\leq j \leq k} \boldsymbolβ_{j}^{\mathsf{T}} \boldsymbol{x} + \mathrm{noise}$. The max-linear model vastly generalizes the conventional linear model, and it can approximate any convex function to an arbitrary accuracy when the number of linear models $k$ is large enough. However, the inherent nonlinearity of the max-linear model renders the estimation of the regression parameters computationally challenging. Particularly, no estimator based on convex programming is known in the literature. We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem. Under the standard Gaussian observation setting, we present a non-asymptotic performance guarantee showing that the convex program recovers the parameters with high probability. When the $k$ linear components are equally likely to achieve the maximum, our result shows a sufficient number of noise-free observations for exact recovery scales as {$k^{4}p$} up to a logarithmic factor. { This sample complexity coincides with that by alternating minimization (Ghosh et al., {2021}). Moreover, the same sample complexity applies when the observations are corrupted with arbitrary deterministic noise. We provide empirical results that show that our method performs as our theoretical result predicts, and is competitive with the alternating minimization algorithm particularly in presence of multiplicative Bernoulli noise. Furthermore, we also show empirically that a recursive application of AR can significantly improve the estimation accuracy.}

Max-Linear Regression by Convex Programming

TL;DR

This work presents Anchored Regression (AR), a scalable convex program for estimating the parameters of a multivariate max-linear regression model under Gaussian covariates and adversarial noise. AR convexifies the typically nonconvex LAD objective by using an anchor vector, enabling nonasymptotic recovery guarantees with sample complexity that scales as , where captures the cone-geometry of the problem. In the balanced setting where the components are equally likely to be the maximum, this reduces to up to logarithmic factors, matching known results for AM in certain regimes under noiseless conditions. The paper provides empirical evidence of AR’s robustness to outliers and deterministic noise, demonstrates competitive performance against AM in Gaussian settings, and introduces iterative AR (IAR) to further improve estimation accuracy. Theoretical development is complemented by a detailed comparison of computational costs and a tightness result for the fundamental bound, highlighting AR’s practical viability for large-scale max-linear regression problems.

Abstract

We consider the multivariate max-linear regression problem where the model parameters need to be estimated from independent samples of the (noisy) observations . The max-linear model vastly generalizes the conventional linear model, and it can approximate any convex function to an arbitrary accuracy when the number of linear models is large enough. However, the inherent nonlinearity of the max-linear model renders the estimation of the regression parameters computationally challenging. Particularly, no estimator based on convex programming is known in the literature. We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem. Under the standard Gaussian observation setting, we present a non-asymptotic performance guarantee showing that the convex program recovers the parameters with high probability. When the linear components are equally likely to achieve the maximum, our result shows a sufficient number of noise-free observations for exact recovery scales as {} up to a logarithmic factor. { This sample complexity coincides with that by alternating minimization (Ghosh et al., {2021}). Moreover, the same sample complexity applies when the observations are corrupted with arbitrary deterministic noise. We provide empirical results that show that our method performs as our theoretical result predicts, and is competitive with the alternating minimization algorithm particularly in presence of multiplicative Bernoulli noise. Furthermore, we also show empirically that a recursive application of AR can significantly improve the estimation accuracy.}

Paper Structure

This paper contains 17 sections, 10 theorems, 134 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\{\mathcal{C}_j\}_{j=1}^k$ and $\{\widetilde{\mathcal{C}}_j\}_{j=1}^k$ be respectively defined as in def:plyhdrl and def:plyhdrl_tilde. Let $\hm{\theta}$ be as in eq:anchorvector and $\{\hm{x}_i\}_{i=1}^n$ be independent copies of $\hm{g} \sim \mathrm{Normal}(\hm{0}, \hm{I}_p)$. Then there exis If the feasible set of the optimization problem in eq:estimator is not empty and the number of obse

Figures (5)

  • Figure 1: Phase transition of recovery rate for varying $n$ and $p$ in the noiseless case ($k=5$).
  • Figure 2: Phase transition of recovery rate for varying $n$ and $k$ in the noiseless case ($p=20$).
  • Figure 3: Estimation error versus the number of observations $n$ under Gaussian noise of variance $\sigma^2$ ($k=6$ and $p=30$): repeated random initialization (black line with square markers), AR (green line with triangle markers), iterative AR (blue line and circle markers), and AM (red dashed line). All methods start from the repeated random initialization.
  • Figure 4: Estimation error and validation error via cross-validation by AR for varying $\eta$ ($k=3, p=30,$ and $n=1,500$): The dotted vertical line indicates the location of $\eta_\star$ that achieves the equality in \ref{['eq:cond_eta']}.
  • Figure 5: Estimation error versus the number of observations $n$ under multiplicative Bernoulli noise model with probability $\varphi$ ($k=6$ and $p=30$): repeated random initialization (black line with square markers), AR (green line with triangle markers), IAR (blue line with circle markers), AM (red dashed line), and AM-LAD (magenta line with asterisk markers). All methods start from repeated random initialization.

Theorems & Definitions (11)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Proposition 1
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Example 1
  • Lemma 6
  • Lemma 7
  • ...and 1 more