Table of Contents
Fetching ...

Meta-Learning on Augmented Gene Expression Profiles for Enhanced Lung Cancer Detection

Arya Hadizadeh Moghaddam, Mohsen Nayebi Kerdabadi, Cuncong Zhong, Zijun Yao

TL;DR

This study presents a meta-learning-based approach for predicting lung cancer from gene expression profiles using four distinct datasets and shows the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets.

Abstract

Gene expression profiles obtained through DNA microarray have proven successful in providing critical information for cancer detection classifiers. However, the limited number of samples in these datasets poses a challenge to employ complex methodologies such as deep neural networks for sophisticated analysis. To address this "small data" dilemma, Meta-Learning has been introduced as a solution to enhance the optimization of machine learning models by utilizing similar datasets, thereby facilitating a quicker adaptation to target datasets without the requirement of sufficient samples. In this study, we present a meta-learning-based approach for predicting lung cancer from gene expression profiles. We apply this framework to well-established deep learning methodologies and employ four distinct datasets for the meta-learning tasks, where one as the target dataset and the rest as source datasets. Our approach is evaluated against both traditional and deep learning methodologies, and the results show the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets. Moreover, we conduct the comparative analysis between meta-learning and transfer learning methodologies to highlight the efficiency of the proposed approach in addressing the challenges associated with limited sample sizes. Finally, we incorporate the explainability study to illustrate the distinctiveness of decisions made by meta-learning.

Meta-Learning on Augmented Gene Expression Profiles for Enhanced Lung Cancer Detection

TL;DR

This study presents a meta-learning-based approach for predicting lung cancer from gene expression profiles using four distinct datasets and shows the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets.

Abstract

Gene expression profiles obtained through DNA microarray have proven successful in providing critical information for cancer detection classifiers. However, the limited number of samples in these datasets poses a challenge to employ complex methodologies such as deep neural networks for sophisticated analysis. To address this "small data" dilemma, Meta-Learning has been introduced as a solution to enhance the optimization of machine learning models by utilizing similar datasets, thereby facilitating a quicker adaptation to target datasets without the requirement of sufficient samples. In this study, we present a meta-learning-based approach for predicting lung cancer from gene expression profiles. We apply this framework to well-established deep learning methodologies and employ four distinct datasets for the meta-learning tasks, where one as the target dataset and the rest as source datasets. Our approach is evaluated against both traditional and deep learning methodologies, and the results show the superior performance of meta-learning on augmented source data compared to the baselines trained on single datasets. Moreover, we conduct the comparative analysis between meta-learning and transfer learning methodologies to highlight the efficiency of the proposed approach in addressing the challenges associated with limited sample sizes. Finally, we incorporate the explainability study to illustrate the distinctiveness of decisions made by meta-learning.
Paper Structure (14 sections, 15 equations, 4 figures, 7 tables)

This paper contains 14 sections, 15 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: A visualization of the proposed method's architecture is presented. Initially, a feature selection process is employed to identify the most informative features while reducing dimensionality. Subsequently, the neural networks receive samples from source data and target batches, and loss values from the target and source datasets are used in meta-learning optimization.
  • Figure 2: Illustration of the meta-loss adaption with the sources and target losses. A superior convergence of the loss function across diverse datasets with improve generalization is achieved through meta-learning.
  • Figure 3: The effect of varying values of $\lambda$ in F1 score showing the influence of meta-learning.
  • Figure 4: Illustration of SHAP values for the test set of GSE135304 to find the most important gene sequences that effect the model decisions. The SHAP values with against without meta-learning for the Transformer are presented.