Table of Contents
Fetching ...

Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning

Huili Zheng, Qimin Zhang, Yiru Gong, Zheyan Liu, Shaohan Chen

TL;DR

This study seeks prognostic biomarkers for stage III NSCLC in non-smoking females by analyzing high-dimensional gene expression data from the GDS3837 set. Using XGBoost, the authors achieve strong discrimination between stage III+ and earlier stages with an AUC of $0.835$, and identify five top biomarkers (C/EBPα, LDHA, UNC-45B, CHK1, HIF-1α) that have literature-supported links to lung cancer. The work demonstrates the utility of machine learning to extract meaningful, clinically relevant biomarkers from complex genomic data, with potential implications for early diagnosis and personalized therapy. Overall, it underscores the value of integrating ML with molecular profiling to advance biomarker discovery in lung cancer subgroups.

Abstract

Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performance with an AUC score of 0.835. The top biomarkers identified - CCAAT enhancer binding protein alpha (C/EBP-alpha), lactate dehydrogenase A4 (LDHA), UNC-45 myosin chaperone B (UNC-45B), checkpoint kinase 1 (CHK1), and hypoxia-inducible factor 1 subunit alpha (HIF-1-alpha) - have been validated in the literature as being significantly linked to lung cancer. These findings highlight the potential of these biomarkers for early diagnosis and personalized therapy, emphasizing the value of integrating machine learning with molecular profiling in cancer research.

Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning

TL;DR

This study seeks prognostic biomarkers for stage III NSCLC in non-smoking females by analyzing high-dimensional gene expression data from the GDS3837 set. Using XGBoost, the authors achieve strong discrimination between stage III+ and earlier stages with an AUC of , and identify five top biomarkers (C/EBPα, LDHA, UNC-45B, CHK1, HIF-1α) that have literature-supported links to lung cancer. The work demonstrates the utility of machine learning to extract meaningful, clinically relevant biomarkers from complex genomic data, with potential implications for early diagnosis and personalized therapy. Overall, it underscores the value of integrating ML with molecular profiling to advance biomarker discovery in lung cancer subgroups.

Abstract

Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performance with an AUC score of 0.835. The top biomarkers identified - CCAAT enhancer binding protein alpha (C/EBP-alpha), lactate dehydrogenase A4 (LDHA), UNC-45 myosin chaperone B (UNC-45B), checkpoint kinase 1 (CHK1), and hypoxia-inducible factor 1 subunit alpha (HIF-1-alpha) - have been validated in the literature as being significantly linked to lung cancer. These findings highlight the potential of these biomarkers for early diagnosis and personalized therapy, emphasizing the value of integrating machine learning with molecular profiling in cancer research.
Paper Structure (9 sections, 2 equations, 3 figures, 1 table)

This paper contains 9 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: GDS3837 Gene Expression Heatmap
  • Figure 2: ROC Curve of the Model
  • Figure 3: Feature Importance