Table of Contents
Fetching ...

Kernel-Based Learning of Chest X-ray Images for Predicting ICU Escalation among COVID-19 Patients

Qiyuan Shi, Jian Kang, Yi Li

TL;DR

This work extends MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refers to the proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK).

Abstract

Kernel methods have been extensively utilized in machine learning for classification and prediction tasks due to their ability to capture complex non-linear data patterns. However, single kernel approaches are inherently limited, as they rely on a single type of kernel function (e.g., Gaussian kernel), which may be insufficient to fully represent the heterogeneity or multifaceted nature of real-world data. Multiple kernel learning (MKL) addresses these limitations by constructing composite kernels from simpler ones and integrating information from heterogeneous sources. Despite these advances, traditional MKL methods are primarily designed for continuous outcomes. We extend MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refer to our proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK). Empirically, we demonstrate that GLIMARK can effectively recover or approximate the true data-generating mechanism. We have applied it to a COVID-19 chest X-ray dataset, predicting binary outcomes of ICU escalation and extracting clinically meaningful features, underscoring the practical utility of this approach in real-world scenarios.

Kernel-Based Learning of Chest X-ray Images for Predicting ICU Escalation among COVID-19 Patients

TL;DR

This work extends MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refers to the proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK).

Abstract

Kernel methods have been extensively utilized in machine learning for classification and prediction tasks due to their ability to capture complex non-linear data patterns. However, single kernel approaches are inherently limited, as they rely on a single type of kernel function (e.g., Gaussian kernel), which may be insufficient to fully represent the heterogeneity or multifaceted nature of real-world data. Multiple kernel learning (MKL) addresses these limitations by constructing composite kernels from simpler ones and integrating information from heterogeneous sources. Despite these advances, traditional MKL methods are primarily designed for continuous outcomes. We extend MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refer to our proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK). Empirically, we demonstrate that GLIMARK can effectively recover or approximate the true data-generating mechanism. We have applied it to a COVID-19 chest X-ray dataset, predicting binary outcomes of ICU escalation and extracting clinically meaningful features, underscoring the practical utility of this approach in real-world scenarios.
Paper Structure (17 sections, 13 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 13 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1:
  • Figure 2: Simulation results. Red text represents the contribution percentage of each view. G# denotes a group, while C# refers to a possible combination. Asterisks indicate the truly active combinations in the true model. Scenarios A–E represent the feature detection performance of GLIMARK, while scenarios F–G represent its overall performance in response to the two hyperparameters $\lambda$ and $T$.
  • Figure 3: Performance comparison. For Random Forest, each point represents the average cross-validation performance for a fixed $n_{\text{estimators}}$. For GLIMARK, since performance depends on both $\lambda$ and $T$, each point represents the highest average cross-validation performance obtained across different $\lambda$ values for a given $T$.
  • Figure 4: Feature importance. For logistic models, contribution percentages are computed as the sum of absolute coefficient values within each class. For Random Forest, importance is based on the sum of Gini importance within each class. All models, except the basic logistic regression, correspond to their best fine-tuned configurations. For the GLIMARK model, Level 3 estimates are merged back to their corresponding Level 2 feature classes and we omit the small contributions from Level 1, as Level 1 does not provide feature class estimates.
  • Figure 5: Coefficient contribution and top representers. For each model only the three most influential points are displayed.