Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner
Xubin Wang, Yunhe Wang, Zhiqing Ma, Ka-Chun Wong, Xiangtao Li
TL;DR
The study tackles cancer type screening from high-dimensional gene expression data, where small sample sizes and noise hinder biomarker discovery. It introduces Evolutionary Optimized Diverse Ensemble Learning (EODE), a framework that marries Grey Wolf Optimizer-based feature selection with a diverse ensemble strategy that uses subspace generation and ensemble pruning via GWO and plurality voting. Evaluation across 35 cancer gene expression datasets shows EODE achieving superior accuracy and generalization compared with single classifiers and multiple baselines, including many nature-inspired methods, often selecting compact biomarker subsets. The approach offers robust generalization and practical potential for biomarker discovery and precision oncology, with public code available for reproducibility at Github.
Abstract
Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.
