Table of Contents
Fetching ...

The Generalized Elastic Net for least squares regression with network-aligned signal and correlated design

Huy Tran, Sansen Wei, Claire Donnat

TL;DR

The paper introduces the Generalized Elastic Net (GEN), a regression framework that uses a graph-incidence-based $\ell_1+\ell_2$ penalty to exploit smoothness or piecewise-constant structure of the signal with respect to a given graph, addressing correlated design by augmenting the traditional loss with a graph-aware regularization. It provides non-asymptotic error bounds that depend on the graph, the spectrum of $\Sigma$, and a decomposition of the signal into kernel and orthogonal components, showing that the $\ell_2$ term improves conditioning and tightens prediction and estimation guarantees. A dual-coordinate-descent algorithm is developed for efficient computation at scale, with runtime analysis and comparisons against IP/ADMM/ECOS demonstrating favorable scaling. Extensive synthetic and real-data experiments (including COVID-19, Alzheimer's disease, and Chicago crime datasets) illustrate GEN’s superior performance when signals align with the graph, highlighting its adaptability to diverse graph structures and correlated designs. The work also discusses practical considerations and limitations, such as hyperparameter tuning and the impact of the network-alignment assumption on results.

Abstract

We propose a novel $\ell_1+\ell_2$-penalty, which we refer to as the Generalized Elastic Net, for regression problems where the feature vectors are indexed by vertices of a given graph and the true signal is believed to be smooth or piecewise constant with respect to this graph. Under the assumption of correlated Gaussian design, we derive upper bounds for the prediction and estimation errors, which are graph-dependent and consist of a parametric rate for the unpenalized portion of the regression vector and another term that depends on our network alignment assumption. We also provide a coordinate descent procedure based on the Lagrange dual objective to compute this estimator for large-scale problems. Finally, we compare our proposed estimator to existing regularized estimators on a number of real and synthetic datasets and discuss its potential limitations.

The Generalized Elastic Net for least squares regression with network-aligned signal and correlated design

TL;DR

The paper introduces the Generalized Elastic Net (GEN), a regression framework that uses a graph-incidence-based penalty to exploit smoothness or piecewise-constant structure of the signal with respect to a given graph, addressing correlated design by augmenting the traditional loss with a graph-aware regularization. It provides non-asymptotic error bounds that depend on the graph, the spectrum of , and a decomposition of the signal into kernel and orthogonal components, showing that the term improves conditioning and tightens prediction and estimation guarantees. A dual-coordinate-descent algorithm is developed for efficient computation at scale, with runtime analysis and comparisons against IP/ADMM/ECOS demonstrating favorable scaling. Extensive synthetic and real-data experiments (including COVID-19, Alzheimer's disease, and Chicago crime datasets) illustrate GEN’s superior performance when signals align with the graph, highlighting its adaptability to diverse graph structures and correlated designs. The work also discusses practical considerations and limitations, such as hyperparameter tuning and the impact of the network-alignment assumption on results.

Abstract

We propose a novel -penalty, which we refer to as the Generalized Elastic Net, for regression problems where the feature vectors are indexed by vertices of a given graph and the true signal is believed to be smooth or piecewise constant with respect to this graph. Under the assumption of correlated Gaussian design, we derive upper bounds for the prediction and estimation errors, which are graph-dependent and consist of a parametric rate for the unpenalized portion of the regression vector and another term that depends on our network alignment assumption. We also provide a coordinate descent procedure based on the Lagrange dual objective to compute this estimator for large-scale problems. Finally, we compare our proposed estimator to existing regularized estimators on a number of real and synthetic datasets and discuss its potential limitations.
Paper Structure (29 sections, 24 theorems, 146 equations, 16 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 24 theorems, 146 equations, 16 figures, 5 tables, 2 algorithms.

Key Result

theorem 2.1

Fix $\delta > 0$ and choose $\lambda_1 = 32\sigma \rho(\Gamma)\sqrt{\frac{\gmax(\Sigma)\log p}{n}}$, $\lambda_2 \leq \frac{\lambda_1}{8\|\Gamma\beta^*\|_\infty}$. Given any set $S$ satisfying both and with probability at least $1-c_1\exp(-nc_2) - \frac{2}{m} - e^{-\delta^2/2}$ we have

Figures (16)

  • Figure 1: Runtimes of different algorithms (reported on the log scale) when (a) $p$ is fixed but $n$ increases, or (b) $n$ is fixed but $p$ increases. The tolerance levels for IP, CD, and ECOS are set at $10^{-4}$. The tolerance level for ADMM is $10^{-3}$. Signals are defined on a 1D chain graph with $p$ vertices. In both situations, CD has the best runtime scaling, and IP scales better than ECOS.
  • Figure 2: Runtimes of different algorithms (reported on the log scale) when $n$ is fixed but $p$ increases. (a) Signals are defined on a $p$-vertex 2D grid graph ($m = 2p - 2\sqrt{p}$) with $\|\Gamma\beta^*\|_\infty = 0.66$. (b) Signals are defined on a $p$-vertex star graph ($m = p-1$) with $\|\Gamma\beta^*\|_\infty = 0.5$. The tolerance levels for IP, CD, and ECOS are set at $10^{-4}$. The tolerance level for ADMM is $10^{-3}$. As before, $(\lambda_1, \lambda_2)$ are chosen according to theory. In both situations, CD has the best runtime scaling.
  • Figure 3: Left: the covariance matrix obtained for a 2D grid graph with $p = 3 \times 3$ vertices. Right: the covariance matrix obtained for a barbell graph with two cliques $\{1,2,3\}$ and $\{7,8,9\}$ connected by the path $\{3,4,5,6,7\}$. Note that correlation is higher for adjacent or nearby vertices.
  • Figure 4: True signals defined on the chain graph with $p = 110$. The top left signal is piecewise constant and has the smallest $\|\Gamma\beta^*\|_0 = 3$ but the largest $\|\Gamma\beta^*\|_\infty = 5$. The bottom right signal is the smoothest with the largest $\|\Gamma\beta^*\|_0 = 99$ and the smallest $\|\Gamma\beta^*\|_\infty = 0.24$. The intermediate signals are constructed such that $\|\Gamma\beta^*\|_0$ decreases but $\|\Gamma\beta^*\|_\infty$ increases gradually. All 6 signals have $\|\Gamma\beta^*\|_1 = 15$.
  • Figure 5: True signals defined on the 2D grid with $p = 15 \times 15$. The top left signal is piecewise constant and has the smallest $\|\Gamma\beta^*\|_0 = 28$ but the largest $\|\Gamma\beta^*\|_\infty = 3$. The bottom right signal is the smoothest with the largest $\|\Gamma\beta^*\|_0 = 412$ and the smallest $\|\Gamma\beta^*\|_\infty = 0.24$. All 6 signals have $\|\Gamma\beta^*\|_1$ between 84 and 120.
  • ...and 11 more figures

Theorems & Definitions (34)

  • theorem 2.1: Main theorem
  • lemma 2.1: Restricted eigenvalue property for random Gaussian design
  • corollary 2.1
  • corollary 2.2
  • corollary 2.3
  • lemma 2.2
  • corollary 2.4
  • lemma 2.3
  • corollary 2.5
  • lemma 2.4
  • ...and 24 more