Table of Contents
Fetching ...

Penalized Generative Variable Selection

Tong Wang, Jian Huang, Shuangge Ma

TL;DR

The paper introduces a penalized two-stage framework for variable selection in high-dimensional predictor settings using Conditional Wasserstein GANs. Stage 1 performs variable selection by applying a group Lasso penalty to the generator’s first-layer weights, identifying important predictors, while Stage 2 refines estimation with only the selected subset. It extends the approach to censored survival data via Kaplan-Meier weighting and establishes convergence rates and selection consistency under varying dimensionality, along with practical validation through simulations and real-data analyses (TCGA-LUAD, MIMIC-III albumin and survival, HIV mutations). The combination of distribution-matching based estimation, theoretical guarantees, and strong empirical performance offers a robust tool for sparse high-dimensional modeling in biomedical contexts. The practical impact lies in reliable variable selection and distribution-aware estimation in settings with censorship and complex predictor structures, enabling better interpretability and prediction.

Abstract

Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using the conditional Wasserstein Generative Adversarial networks. Group Lasso penalization is applied for variable selection, which may improve model estimation/prediction, interpretability, stability, etc. Significantly advancing from the existing literature, the analysis of censored survival data is also considered. We establish the convergence rate for variable selection while considering the approximation error, and obtain a more efficient distribution estimation. Simulations and the analysis of real experimental data demonstrate satisfactory practical utility of the proposed analysis.

Penalized Generative Variable Selection

TL;DR

The paper introduces a penalized two-stage framework for variable selection in high-dimensional predictor settings using Conditional Wasserstein GANs. Stage 1 performs variable selection by applying a group Lasso penalty to the generator’s first-layer weights, identifying important predictors, while Stage 2 refines estimation with only the selected subset. It extends the approach to censored survival data via Kaplan-Meier weighting and establishes convergence rates and selection consistency under varying dimensionality, along with practical validation through simulations and real-data analyses (TCGA-LUAD, MIMIC-III albumin and survival, HIV mutations). The combination of distribution-matching based estimation, theoretical guarantees, and strong empirical performance offers a robust tool for sparse high-dimensional modeling in biomedical contexts. The practical impact lies in reliable variable selection and distribution-aware estimation in settings with censorship and complex predictor structures, enabling better interpretability and prediction.

Abstract

Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using the conditional Wasserstein Generative Adversarial networks. Group Lasso penalization is applied for variable selection, which may improve model estimation/prediction, interpretability, stability, etc. Significantly advancing from the existing literature, the analysis of censored survival data is also considered. We establish the convergence rate for variable selection while considering the approximation error, and obtain a more efficient distribution estimation. Simulations and the analysis of real experimental data demonstrate satisfactory practical utility of the proposed analysis.
Paper Structure (31 sections, 11 theorems, 92 equations, 2 figures, 13 tables, 2 algorithms)

This paper contains 31 sections, 11 theorems, 92 equations, 2 figures, 13 tables, 2 algorithms.

Key Result

Theorem 1

Suppose that Conditions C:XY-C:Lojasewica hold. If $p=o(\log^{c} n_1)$ with $0<c<1$ and ${\lambda_n}=O({n_1}^{-\frac{1}{2(p+1)}})$, $\hat{\bm{\theta}}_1$ defined in (eq:est-stage1) with first-layer weights $\bm{w}_0(\hat{\bm{\theta}}_1)=(\hat{\bm{\mu}}_1,\hat{\bm{\nu}}_1,\hat{\bm{\upsilon}}_1)$ sati where $a>2$ is a positive constant, and $\mathbb{E}$ is taken with respect to $\{(\bm{X}_i,Y_i)\}_{

Figures (2)

  • Figure 1: Upper: Architecture of the penalized generator network $g_{\bm{\theta}_1}$. Lower: Training process of the proposed method.
  • Figure A.2: Analysis of the MIMIC-III data on survival: identification frequencies.

Theorems & Definitions (20)

  • Remark 1
  • Definition 1: Importance
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • Proposition 1
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 10 more