Table of Contents
Fetching ...

SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data

Ying Hu, Hu Yang

TL;DR

The proposed Single-Parametric Principal Component Selection Operator (SPPCSO) is an innovative penalized estimation method that integrates single-parametric principal component regression and regularization to adaptively adjust the shrinkage factor by incorporating principal component information.

Abstract

With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and $L_{1}$ regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.

SPPCSO: Adaptive Penalized Estimation Method for High-Dimensional Correlated Data

TL;DR

The proposed Single-Parametric Principal Component Selection Operator (SPPCSO) is an innovative penalized estimation method that integrates single-parametric principal component regression and regularization to adaptively adjust the shrinkage factor by incorporating principal component information.

Abstract

With the rise of high-dimensional correlated data, multicollinearity poses a significant challenge to model stability, often leading to unstable estimation and reduced predictive accuracy. This work proposes the Single-Parametric Principal Component Selection Operator (SPPCSO), an innovative penalized estimation method that integrates single-parametric principal component regression and regularization to adaptively adjust the shrinkage factor by incorporating principal component information. This approach achieves a balance between variable selection and coefficient estimation, ensuring model stability and robust estimation even in high-dimensional, high-noise environments. The primary contribution lies in addressing the instability of traditional variable selection methods when applied to high-noise, high-dimensional correlated data. Theoretically, our method exhibits selection consistency and achieves a smaller estimation error bound compared to traditional penalized estimation approaches. Extensive numerical experiments demonstrate that SPPCSO not only delivers stable and reliable estimation in high-noise settings but also accurately distinguishes signal variables from noise variables in group-effect structured data with highly correlated noise variables, effectively eliminating redundant variables and achieving more stable variable selection. Furthermore, SPPCSO successfully identifies disease-associated genes in gene expression data analysis, showcasing strong practical value. The results indicate that SPPCSO serves as an ideal tool for high-dimensional variable selection, offering an efficient and interpretable solution for modeling correlated data.
Paper Structure (11 sections, 4 theorems, 47 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 11 sections, 4 theorems, 47 equations, 6 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

We define an artificial data set $(y^{*},X^{*})$ by:

Figures (6)

  • Figure 1: Solution path of the SPPCSO with respect to the parameter $\lambda$
  • Figure 2: Cross-validation curve with respect to the parameter $\lambda$
  • Figure 3: Cross-validation curve with respect to the parameter $\theta$
  • Figure 4: Performance comparison of methods. The image on the left is the radar chart of TPR, TNR, and TMR values for Example 1, while the one on the right corresponds to Example 2.
  • Figure 5: Box line plot of MAPE values
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Lemma 1
  • Theorem 2
  • Theorem 3