Table of Contents
Fetching ...

Multiple Regression for Matrix and Vector Predictors: Models, Theory, Algorithms, and Beyond

Meixia Lin, Ziyang Zeng, Yangjing Zhang

TL;DR

The paper studies regression where responses depend on both matrix predictors $X_i$ and vector predictors $z_i$ through $y_i \approx \langle X_i,B\rangle + \langle z_i,\gamma\rangle$, and introduces a general convex regularized framework with penalties such as $\phi(B)=\rho\|B\|_*$ and $\psi(\gamma)=\lambda\|\gamma\|_1$. It develops a preconditioned proximal point algorithm (PPA) whose dual problem $\Phi_k(\xi)$ is differentiable with Lipschitz gradient and can be solved by a semismooth Newton method, exploiting second-order sparsity to update the primal variables via proximal maps. The main contributions are (i) finite-sample consistency results ($n$-consistency and $\sqrt{n}$-consistency) under nuclear and $\ell_1$ penalties, (ii) a scalable, robust solver that leverages second-order information, and (iii) extensive empirical evidence showing improved estimation, prediction accuracy, and computational efficiency over ADMM and Nesterov methods on both synthetic and COVID-19 data. The framework extends to other convex losses (and potentially to logistic/Poisson) and offers a path toward efficient handling of high-dimensional matrix–vector regression problems with structured penalties in real-world multivariate data analysis.

Abstract

Matrix regression plays an important role in modern data analysis due to its ability to handle complex relationships involving both matrix and vector variables. We propose a class of regularized regression models capable of predicting both matrix and vector variables, accommodating various regularization techniques tailored to the inherent structures of the data. We establish the consistency of our estimator when penalizing the nuclear norm of the matrix variable and the $\ell_1$ norm of the vector variable. To tackle the general regularized regression model, we propose a unified framework based on an efficient preconditioned proximal point algorithm. Numerical experiments demonstrate the superior estimation and prediction accuracy of our proposed estimator, as well as the efficiency of our algorithm compared to the state-of-the-art solvers.

Multiple Regression for Matrix and Vector Predictors: Models, Theory, Algorithms, and Beyond

TL;DR

The paper studies regression where responses depend on both matrix predictors and vector predictors through , and introduces a general convex regularized framework with penalties such as and . It develops a preconditioned proximal point algorithm (PPA) whose dual problem is differentiable with Lipschitz gradient and can be solved by a semismooth Newton method, exploiting second-order sparsity to update the primal variables via proximal maps. The main contributions are (i) finite-sample consistency results (-consistency and -consistency) under nuclear and penalties, (ii) a scalable, robust solver that leverages second-order information, and (iii) extensive empirical evidence showing improved estimation, prediction accuracy, and computational efficiency over ADMM and Nesterov methods on both synthetic and COVID-19 data. The framework extends to other convex losses (and potentially to logistic/Poisson) and offers a path toward efficient handling of high-dimensional matrix–vector regression problems with structured penalties in real-world multivariate data analysis.

Abstract

Matrix regression plays an important role in modern data analysis due to its ability to handle complex relationships involving both matrix and vector variables. We propose a class of regularized regression models capable of predicting both matrix and vector variables, accommodating various regularization techniques tailored to the inherent structures of the data. We establish the consistency of our estimator when penalizing the nuclear norm of the matrix variable and the norm of the vector variable. To tackle the general regularized regression model, we propose a unified framework based on an efficient preconditioned proximal point algorithm. Numerical experiments demonstrate the superior estimation and prediction accuracy of our proposed estimator, as well as the efficiency of our algorithm compared to the state-of-the-art solvers.

Paper Structure

This paper contains 19 sections, 5 theorems, 74 equations, 4 figures, 9 tables, 2 algorithms.

Key Result

Theorem 2.1

Suppose Assumption assu: regularity holds and $S$ in eq: S is positive definite. If $\rho_n/n\rightarrow \rho_0\geq 0$ and $\lambda_n/n\rightarrow \lambda_0\geq 0$, then in probability as $n\rightarrow \infty$, where In particular, if $\rho_n = o(n)$ and $\lambda_n = o(n)$, then $\underset{U\in \mathbb{R}^{m\times q}, \beta \in \mathbb{R}^p}{\arg\min}\ Z(U,\beta) = (B,\gamma)$ and therefore $(\w

Figures (4)

  • Figure 1: Comparison of the four estimators VML, NL, NFL, and NSGL under square shaped $B$ and different $\gamma$ generating schemes (S1), (S2), and (S3) from top to bottom rows. In each subgraph, the left square depicts the matrix coefficients, while the right rectangle depicts the vector coefficients. For clarity, segment from 100 to 400 of estimated vector coefficients is displayed for (S1) and (S2), while for (S3), the segment is from 210 to 260 (5th group). For every estimated matrix coefficients, we set the color limit to be 0 and 1, that is, entries with non-positive values are mapped to white, entries with values greater than 1 are mapped to black, and entries with values between 0 and 1 are uniformly mapped to a color scale ranging from white to black. For estimated vector coefficients, we set the color limit to be 0 and 1 for (S1) and (S2), while for (S3), we map entries whose absolute values are smaller than 0.2 to gray, and the entries that are greater than $0.2$ to black, and those less than $-0.2$ to white to highlight the group structure.
  • Figure 2: Time vs. $R_{\rm obj}$ (see \ref{['eq:Robj']}) of PPDNA, ADMM, and Nesterov algorithm on synthetic data. The penalty parameters are taken from Table \ref{['table:new_format_update']} with asterisks (*).
  • Figure 3: Time vs. $R_{\rm obj}$ (see \ref{['eq:Robj']}) of PPDNA and ADMM on synthetic data. The penalty parameters are taken from Table \ref{['table:synthetic_efficiency_vec']} with asterisks (*).
  • Figure 4: The true shapes of $B\in\mathbb{R}^{64\times64}$ used in Section \ref{['sec:2d']}

Theorems & Definitions (9)

  • Theorem 2.1
  • Theorem 2.2
  • proof
  • Theorem 3.1
  • Theorem 3.2
  • Remark 3.3
  • Proposition 3.4
  • proof
  • Definition B.1