Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

Marcos Matabuena

Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

Marcos Matabuena

TL;DR

This work proposes new optimization-based variable selection methods for multivariate, functional, and even more general outcomes in metrics spaces based on best-subset selection, and demonstrates that the proposed methodology outperforms state-of-the-art methods in accuracy and speed.

Abstract

Many problems within personalized medicine and digital health rely on the analysis of continuous-time functional biomarkers and other complex data structures emerging from high-resolution patient monitoring. In this context, this work proposes new optimization-based variable selection methods for multivariate, functional, and even more general outcomes in metrics spaces based on best-subset selection. Our framework applies to several types of regression models, including linear, quantile, or non parametric additive models, and to a broad range of random responses, such as univariate, multivariate Euclidean data, functional, and even random graphs. Our analysis demonstrates that our proposed methodology outperforms state-of-the-art methods in accuracy and, especially, in speed-achieving several orders of magnitude improvement over competitors across various type of statistical responses as the case of mathematical functions. While our framework is general and is not designed for a specific regression and scientific problem, the article is self-contained and focuses on biomedical applications. In the clinical areas, serves as a valuable resource for professionals in biostatistics, statistics, and artificial intelligence interested in variable selection problem in this new technological AI-era.

Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

TL;DR

Abstract

Paper Structure (35 sections, 7 theorems, 44 equations, 5 figures, 5 tables)

This paper contains 35 sections, 7 theorems, 44 equations, 5 figures, 5 tables.

Introduction
Literature overview
Applications
Specific Scientific Goals
Paper contributions
Paper outline
Model formulation
Boolean relaxation and efficient algorithms
Tightness result
Dual sub-gradient algorithm
Extension to Group Variable Selection
Variable Selection in Metric Space Responses
Global Fréchet Model
Variable Selection in Metric Spaces with Penalized Ridge Algorithm
Metric spaces of Negative Type
...and 20 more sections

Key Result

Theorem 1

For any convex loss functions $\ell_t$, and assume additive linear structure across different loss function $\ell_t, t\in [m],$ the optimization problem eqn:generic.ss is equivalent to where $\hat{\ell}(y,a):= \max_{u\in \mathbb{R}} u a-\ell(y,u)$ is a convex function known as the Fenchel conjugate of $\ell$bauschke2012fenchel.In particular, the function $f$ is continuous, linear in $s$, and conc

Figures (5)

Figure 1: Variation in the mean and standard deviation of glucose values for a diabetic individual depending on the day of the week and time of day.
Figure 2: Average glucose trajectories (left) and standard deviation trejectories (right).
Figure 3: Left: Raw CGM time series of two individuals. Center: The corresponding density functions. Right: The corresponding quantile representation.
Figure 4: Raw Quantile Outcomes
Figure 5: P-values across the temporal domain of the statistical significance of each variable selected

Theorems & Definitions (17)

Remark 1
Theorem 1
Theorem 1
Remark 2
Definition 1
Theorem 2: Schoenberg (1937, 1938)schoenberg1937certainschoenberg1938metric
Example 1: Laplacian graph
Example 2: The 2-Wasserstein Distance in the univariate case
Remark 3
Proposition 3
...and 7 more

Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

TL;DR

Abstract

Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (17)