Table of Contents
Fetching ...

Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification

Albert Belenguer-Llorens, Carlos Sevilla-Salcedo, Jussi Tohka, Vanessa Gómez-Verdejo

TL;DR

BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

Abstract

We present BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification

TL;DR

BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

Abstract

We present BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.

Paper Structure

This paper contains 33 sections, 110 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Diagram of the graphical model of BALDUR for classification tasks (top) and the two possible view settings: primal (middle) and dual (bottom). Grey circles denote observed variables, white circles unobserved random variables and rectangles represent node groups dependent on the view's feature space. The nodes without a circle correspond to the hyperparameters.
  • Figure 2: Features selected per fold (x-axis) and their corresponding weight in absolute value (y-axis).
  • Figure 3: Brain regions selected by BALDUR at gray matter (top left) and intensity (top right). Also, the bottom image depicts the overlap between gray matter (blue) and intensity (red).