Table of Contents
Fetching ...

Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner

Xubin Wang, Yunhe Wang, Zhiqing Ma, Ka-Chun Wong, Xiangtao Li

TL;DR

The study tackles cancer type screening from high-dimensional gene expression data, where small sample sizes and noise hinder biomarker discovery. It introduces Evolutionary Optimized Diverse Ensemble Learning (EODE), a framework that marries Grey Wolf Optimizer-based feature selection with a diverse ensemble strategy that uses subspace generation and ensemble pruning via GWO and plurality voting. Evaluation across 35 cancer gene expression datasets shows EODE achieving superior accuracy and generalization compared with single classifiers and multiple baselines, including many nature-inspired methods, often selecting compact biomarker subsets. The approach offers robust generalization and practical potential for biomarker discovery and precision oncology, with public code available for reproducibility at Github.

Abstract

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.

Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner

TL;DR

The study tackles cancer type screening from high-dimensional gene expression data, where small sample sizes and noise hinder biomarker discovery. It introduces Evolutionary Optimized Diverse Ensemble Learning (EODE), a framework that marries Grey Wolf Optimizer-based feature selection with a diverse ensemble strategy that uses subspace generation and ensemble pruning via GWO and plurality voting. Evaluation across 35 cancer gene expression datasets shows EODE achieving superior accuracy and generalization compared with single classifiers and multiple baselines, including many nature-inspired methods, often selecting compact biomarker subsets. The approach offers robust generalization and practical potential for biomarker discovery and precision oncology, with public code available for reproducibility at Github.

Abstract

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.
Paper Structure (29 sections, 16 equations, 5 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 16 equations, 5 figures, 5 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of the proposed EODE algorithm: In the GWO feature selection phase, the original cancer gene expression training data is utilized to train all base classifiers, and the classifier with the highest performance is selected as the evaluation classifier. The processed data is then optimized to construct an ensemble model. Specifically, the training data is incrementally clustered using the K-means method to form subspace clusters. These clusters are used to train individual base classifiers, which are then added to the model pool. Any classifiers in the pool with below-average performance are filtered out. Next, the GWO is applied to optimize the classifier pool and determine the best possible ensemble combination. Finally, the optimized ensemble model is evaluated on the independent test dataset using a plurality voting strategy to generate the final cancer type predictions.
  • Figure 2: The GWO algorithm is illustrated in a schematic representation, highlighting the process of updating the positions of the wolves. Initially, the positions of the wolves are randomly initialized within the solution space. The fitness of each wolf is evaluated based on a fitness function. In each iteration, the positions of the wolves are updated using mathematical formulas that consider the social hierarchy, with the $\alpha$ wolf having the greatest influence. The update process involves attracting other wolves towards the positions of the $\alpha$, $\beta$, and $\delta$ wolves. This iterative position updating continues until a termination condition is met. Ultimately, the position of the $\alpha$ wolf represents the best solution found by the GWO algorithm.
  • Figure 3: Performance comparison to the other nature-inspired ensemble learning algorithms. (A) Test classification results of EODE and four other nature-inspired ensemble methods across the 35 cancer gene expression datasets. (B) Comparison graphs of EODE and the other four nature-inspired ensemble methods. (C) The average performance of EODE and the other four nature-inspired ensemble methods across the 35 cancer gene expression datasets.
  • Figure 4: Performance comparisons of the different machine learning algorithms. The first 7 graphs represent the test classification accuracy on the different cancer gene expression datasets, and the last graph indicates the average performance of the seven methods on the 35 datasets.
  • Figure 5: Performance comparisons of the different ensemble learning algorithms. (A) Test classification results of EODE and six other ensemble methods across the 35 cancer gene expression datasets; (B) The average performance of EODE and the six other ensemble classifiers on the 35 datasets; (C) Graphs of EODE versus the other ensemble classifiers, where RF denotes Random Forest.