Table of Contents
Fetching ...

A Multi-Domain Multi-Task Approach for Feature Selection from Bulk RNA Datasets

Karim Salta, Tomojit Ghosh, Michael Kirby

TL;DR

This work tackles high-dimensional bulk RNA-seq feature selection across tissues by introducing a multi-domain multi-task architecture (MDMT) that couples domain-specific variational autoencoders with a shared sparsity layer and a classifier. The approach optimizes a mixed objective over reconstruction, regularization, classification, and sparsification to promote cross-domain feature selection and rank features by cross-run frequency. Empirically, across-domain feature selection reveals novel biomarkers not captured by single-domain analyses and yields well-clustered latent representations for several phenotype pairs, improving interpretability of host immune responses. The method offers a scalable framework for cross-tissue transcriptome analysis with potential to uncover tissue-context biomarkers and guide future biological investigations.

Abstract

In this paper a multi-domain multi-task algorithm for feature selection in bulk RNAseq data is proposed. Two datasets are investigated arising from mouse host immune response to Salmonella infection. Data is collected from several strains of collaborative cross mice. Samples from the spleen and liver serve as the two domains. Several machine learning experiments are conducted and the small subset of discriminative across domains features have been extracted in each case. The algorithm proves viable and underlines the benefits of across domain feature selection by extracting new subset of discriminative features which couldn't be extracted only by one-domain approach.

A Multi-Domain Multi-Task Approach for Feature Selection from Bulk RNA Datasets

TL;DR

This work tackles high-dimensional bulk RNA-seq feature selection across tissues by introducing a multi-domain multi-task architecture (MDMT) that couples domain-specific variational autoencoders with a shared sparsity layer and a classifier. The approach optimizes a mixed objective over reconstruction, regularization, classification, and sparsification to promote cross-domain feature selection and rank features by cross-run frequency. Empirically, across-domain feature selection reveals novel biomarkers not captured by single-domain analyses and yields well-clustered latent representations for several phenotype pairs, improving interpretability of host immune responses. The method offers a scalable framework for cross-tissue transcriptome analysis with potential to uncover tissue-context biomarkers and guide future biological investigations.

Abstract

In this paper a multi-domain multi-task algorithm for feature selection in bulk RNAseq data is proposed. Two datasets are investigated arising from mouse host immune response to Salmonella infection. Data is collected from several strains of collaborative cross mice. Samples from the spleen and liver serve as the two domains. Several machine learning experiments are conducted and the small subset of discriminative across domains features have been extracted in each case. The algorithm proves viable and underlines the benefits of across domain feature selection by extracting new subset of discriminative features which couldn't be extracted only by one-domain approach.
Paper Structure (7 sections, 2 equations, 7 figures, 1 algorithm)

This paper contains 7 sections, 2 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Network design
  • Figure 2: Weighted components of overall losses for phenotypes tolerant versus susceptible across domain experiment: (a) - reconstruction errors, (b) - classification errors, (c) - sparsity loss, (d) - overall loss.
  • Figure 3: Weighted components of overall losses for phenotypes resistant versus susceptible across domain experiments: (a) reconstruction error, (b) classification error, (c) sparsity loss, (d) total loss.
  • Figure 4: Weighted components of overall losses for infected mice versus never infected mice across domain experiment: (a) reconstruction error, (b) classification error, (c) sparsity loss, (d) total loss.
  • Figure 5: Latent space PCA's for all three across domain experiments: (a) phenotypes tolerant (TOL) versus susceptible (SUS), (b) phenotypes resistant (RES) versus susceptible (SUS), (c) infected (INF) mice versus never infected mice (NOT INF).
  • ...and 2 more figures