Table of Contents
Fetching ...

View selection in multi-view stacking: Choosing the meta-learner

Wouter van Loon, Marjolein Fokkema, Botond Szabo, Mark de Rooij

TL;DR

This study evaluates how the choice of meta-learner in multi-view stacking (MVS) affects both view selection and predictive accuracy across simulations and two gene-expression datasets. Seven nonnegative meta-learners are compared, including the interpolating predictor, ridge, elastic net, lasso, adaptive lasso, stability selection, and nonnegative forward selection, with a base-learner trained on each view and predictions aggregated via cross-validated Z matrices. Results show that nonnegative lasso, nonnegative adaptive lasso, nonnegative elastic net, and NNFS consistently balance sparsity and accuracy, while ridge, stability selection, and the interpolating predictor can underperform, especially in high-dimensional or highly correlated settings; elastic net is preferable when correlated views are present, and lasso yields the sparsest solutions. In real data, the lasso often achieves the best accuracy with minimal views, whereas elastic net and ridge may improve AUC or H-measure at the cost of selecting more views, highlighting a practical trade-off between predictive performance and interpretability. Overall, the findings provide actionable guidance for selecting meta-learners in MVS to obtain accurate, sparse, and interpretable view selections in high-dimensional multi-view problems, including genomics and similar biomedical applications.

Abstract

Multi-view stacking is a framework for combining information from different views (i.e. different feature sets) describing the same set of objects. In this framework, a base-learner algorithm is trained on each view separately, and their predictions are then combined by a meta-learner algorithm. In a previous study, stacked penalized logistic regression, a special case of multi-view stacking, has been shown to be useful in identifying which views are most important for prediction. In this article we expand this research by considering seven different algorithms to use as the meta-learner, and evaluating their view selection and classification performance in simulations and two applications on real gene-expression data sets. Our results suggest that if both view selection and classification accuracy are important to the research at hand, then the nonnegative lasso, nonnegative adaptive lasso and nonnegative elastic net are suitable meta-learners. Exactly which among these three is to be preferred depends on the research context. The remaining four meta-learners, namely nonnegative ridge regression, nonnegative forward selection, stability selection and the interpolating predictor, show little advantages in order to be preferred over the other three.

View selection in multi-view stacking: Choosing the meta-learner

TL;DR

This study evaluates how the choice of meta-learner in multi-view stacking (MVS) affects both view selection and predictive accuracy across simulations and two gene-expression datasets. Seven nonnegative meta-learners are compared, including the interpolating predictor, ridge, elastic net, lasso, adaptive lasso, stability selection, and nonnegative forward selection, with a base-learner trained on each view and predictions aggregated via cross-validated Z matrices. Results show that nonnegative lasso, nonnegative adaptive lasso, nonnegative elastic net, and NNFS consistently balance sparsity and accuracy, while ridge, stability selection, and the interpolating predictor can underperform, especially in high-dimensional or highly correlated settings; elastic net is preferable when correlated views are present, and lasso yields the sparsest solutions. In real data, the lasso often achieves the best accuracy with minimal views, whereas elastic net and ridge may improve AUC or H-measure at the cost of selecting more views, highlighting a practical trade-off between predictive performance and interpretability. Overall, the findings provide actionable guidance for selecting meta-learners in MVS to obtain accurate, sparse, and interpretable view selections in high-dimensional multi-view problems, including genomics and similar biomedical applications.

Abstract

Multi-view stacking is a framework for combining information from different views (i.e. different feature sets) describing the same set of objects. In this framework, a base-learner algorithm is trained on each view separately, and their predictions are then combined by a meta-learner algorithm. In a previous study, stacked penalized logistic regression, a special case of multi-view stacking, has been shown to be useful in identifying which views are most important for prediction. In this article we expand this research by considering seven different algorithms to use as the meta-learner, and evaluating their view selection and classification performance in simulations and two applications on real gene-expression data sets. Our results suggest that if both view selection and classification accuracy are important to the research at hand, then the nonnegative lasso, nonnegative adaptive lasso and nonnegative elastic net are suitable meta-learners. Exactly which among these three is to be preferred depends on the research context. The remaining four meta-learners, namely nonnegative ridge regression, nonnegative forward selection, stability selection and the interpolating predictor, show little advantages in order to be preferred over the other three.

Paper Structure

This paper contains 34 sections, 6 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Boxplots of test accuracy for the different meta-learners, with 300 views and 25 features per view. The results are shown for all combinations of the correlation between features within the same view ($\rho_w$), the correlation between features from different views ($\rho_b$), and sample size ($n$). Each plot is based on 100 replications.
  • Figure 2: Boxplots of the true positive rate (TPR) for the different meta-learners, with 300 views and 25 features per view. The results are shown for all combinations of the correlation between features within the same view ($\rho_w$), the correlation between features from different views ($\rho_b$), and sample size ($n$). Each plot is based on 100 replications.
  • Figure 3: Boxplots of the false positive rate (FPR) for the different meta-learners, with 300 views and 25 features per view. The results are shown for all combinations of the correlation between features within the same view ($\rho_w$), the correlation between features from different views ($\rho_b$), and sample size ($n$). Each plot is based on 100 replications.
  • Figure 4: Boxplots of the false discovery rate (FDR) for the different meta-learners, with 300 views and 25 features per view. The results are shown for all combinations of the correlation between features within the same view ($\rho_w$), the correlation between features from different views ($\rho_b$), and sample size ($n$). Each plot is based on 100 replications.
  • Figure 5: Boxplots of the number of features per view for the breast cancer and colitis data sets.
  • ...and 12 more figures