Table of Contents
Fetching ...

An introduction to R package `mvs`

Wouter van Loon

TL;DR

This paper introduces the R package $mvs$ for multi-view stacking ($MVS$) in biomedical data. It describes the workflow where a base-learner is trained on each view, cross-validated predictions are fed into a meta-learner to obtain final predictions, and views can be automatically selected. The package provides two main fitting engines, $StaPLR$ for two-level stacking and $MVS$ for two or more levels, with support for multiple outcome distributions ($Gaussian$, $Binomial$, $Poisson$), penalties such as model relaxation and adaptive weights, and meta-level imputation for missing data. It also provides view-importance metrics, including meta-level coefficients and the minority report measure (MRM), and supports parallel computation to handle high-dimensional, multi-view problems. Together, these features enable interpretable, scalable, and flexible analysis of multi-view biomedical data.

Abstract

In biomedical science, a set of objects or persons can often be described by multiple distinct sets of features obtained from different data sources or modalities (called "multi-view data"). Classical machine learning methods ignore the multi-view structure of such data, limiting model interpretability and performance. The R package `mvs` provides methods that were designed specifically for dealing with multi-view data, based on the multi-view stacking (MVS) framework. MVS is a form of supervised (machine) learning used to train multi-view classification or prediction models. MVS works by training a learning algorithm on each view separately, estimating the predictive power of each view-specific model through cross-validation, and then using another learning algorithm to assign weights to the view-specific models based on their estimated predictions. MVS is a form of ensemble learning, dividing the large multi-view learning problem into smaller sub-problems. Most of these sub-problems can be solved in parallel, making it computationally attractive. Additionally, the number of features of the sub-problems is greatly reduced compared with the full multi-view learning problem. This makes MVS especially useful when the total number of features is larger than the number of observations (i.e., high-dimensional data). MVS can still be applied even if the sub-problems are themselves high-dimensional by adding suitable penalty terms to the learning algorithms. Furthermore, MVS can be used to automatically select the views which are most important for prediction. The R package `mvs` makes fitting MVS models, including such penalty terms, easily and openly accessible. `mvs` allows for the fitting of stacked models with any number of levels, with different penalty terms, different outcome distributions, and provides several options for missing data handling.

An introduction to R package `mvs`

TL;DR

This paper introduces the R package for multi-view stacking () in biomedical data. It describes the workflow where a base-learner is trained on each view, cross-validated predictions are fed into a meta-learner to obtain final predictions, and views can be automatically selected. The package provides two main fitting engines, for two-level stacking and for two or more levels, with support for multiple outcome distributions (, , ), penalties such as model relaxation and adaptive weights, and meta-level imputation for missing data. It also provides view-importance metrics, including meta-level coefficients and the minority report measure (MRM), and supports parallel computation to handle high-dimensional, multi-view problems. Together, these features enable interpretable, scalable, and flexible analysis of multi-view biomedical data.

Abstract

In biomedical science, a set of objects or persons can often be described by multiple distinct sets of features obtained from different data sources or modalities (called "multi-view data"). Classical machine learning methods ignore the multi-view structure of such data, limiting model interpretability and performance. The R package `mvs` provides methods that were designed specifically for dealing with multi-view data, based on the multi-view stacking (MVS) framework. MVS is a form of supervised (machine) learning used to train multi-view classification or prediction models. MVS works by training a learning algorithm on each view separately, estimating the predictive power of each view-specific model through cross-validation, and then using another learning algorithm to assign weights to the view-specific models based on their estimated predictions. MVS is a form of ensemble learning, dividing the large multi-view learning problem into smaller sub-problems. Most of these sub-problems can be solved in parallel, making it computationally attractive. Additionally, the number of features of the sub-problems is greatly reduced compared with the full multi-view learning problem. This makes MVS especially useful when the total number of features is larger than the number of observations (i.e., high-dimensional data). MVS can still be applied even if the sub-problems are themselves high-dimensional by adding suitable penalty terms to the learning algorithms. Furthermore, MVS can be used to automatically select the views which are most important for prediction. The R package `mvs` makes fitting MVS models, including such penalty terms, easily and openly accessible. `mvs` allows for the fitting of stacked models with any number of levels, with different penalty terms, different outcome distributions, and provides several options for missing data handling.

Paper Structure

This paper contains 12 sections, 5 figures.

Figures (5)

  • Figure 1: A simple graphic representation of a multi-view stacking model including 3 views: structural MRI, functional MRI, and genetic information. A sub-model is fitted on each view separately, and the predictions of these sub-models are combined by the meta-learner into a single prediction. Note that the *n* persons are the same persons for each view.
  • Figure 2: A simple graphic representation of how StaPLR can perform automatic view selection. In this (hypothetical) example, functional MRI was discarded from the model because it was not sufficiently predictive of the outcome in the presence of the other two views.
  • Figure 3: The MVS algorithm represented as a flow diagram. StaPLR denotes the special case where all learners are penalized logistic regression learners. Figure adapted from [@StaPLR4]
  • Figure 4: A simple graphic representation of meta-level imputation. Assume, for example, that the three views consist of, respectively, 100, 1000 and 10,000 features. Now, say that there are 10 observations which have missing values on view $X^{(2)}$. Then in traditional imputation we would have to impute 10 × 1000 = 10,000 values whereas in list-wise deletion 10 × (100 + 10,000) = 101,000 values would be deleted even though they were observed. However, in meta-level imputation only 10 values have to be imputed, and no observed values are deleted. Figure adapted from [@StaPLR4].
  • Figure :