Table of Contents
Fetching ...

Pan-Cancer Mapping of the Tumor Immune Landscape through Metagene Clustering and Predictive Modeling

Soham Chatterjee

Abstract

As immunotherapies become standard cancer treatments, it is increasingly important to identify a patient's immune profile, which encompasses the activity of immune cells within the tumor microenvironment and the presence of specific biomarkers. However, we lack mechanistic explanations drivers of immune phenotypes. Despite advances in immune profiling with high-throughput sequencing, the mechanisms driving them remain unclear. This study aimed to identify novel, robust immune-related gene clusters (metagenes) and evaluate their prognostic significance and functional relevance across various pan-cancer types using a comprehensive computational pipeline. We acquired pan-cancer bulk RNA-seq and established immune subtypes from The Cancer Genome Atlas (TCGA). Using expression-based filtering and clustering of genes with ANOVA and Gaussian Mixture Model (GMM), we identified 48 unique metagenes. These metagenes achieved 87% accuracy in predicting the established subtypes. SHAP analysis revealed the most predictive metagenes per subtype, while functional enrichment analysis identified their associated pathways. Genes were ranked by differential expression between high- and low-expression groups. The metagenes revealed insights, including co-expression of immune activation and regulatory factors, links between cell cycle regulation and immune evasion, and dynamic microenvironment remodeling signatures. Kaplan-Meier survival analysis and multivariate Cox Regression revealed that many metagenes had prognostic value for overall survival. Overall, the metagenes represent coordinated biological programs across diverse cancer types, providing a foundation for developing robust, broadly applicable immuno-oncology biomarkers that extend beyond single-gene markers. They demonstrate prognostic value across cancer types and hold potential to guide immunotherapy treatment decisions.

Pan-Cancer Mapping of the Tumor Immune Landscape through Metagene Clustering and Predictive Modeling

Abstract

As immunotherapies become standard cancer treatments, it is increasingly important to identify a patient's immune profile, which encompasses the activity of immune cells within the tumor microenvironment and the presence of specific biomarkers. However, we lack mechanistic explanations drivers of immune phenotypes. Despite advances in immune profiling with high-throughput sequencing, the mechanisms driving them remain unclear. This study aimed to identify novel, robust immune-related gene clusters (metagenes) and evaluate their prognostic significance and functional relevance across various pan-cancer types using a comprehensive computational pipeline. We acquired pan-cancer bulk RNA-seq and established immune subtypes from The Cancer Genome Atlas (TCGA). Using expression-based filtering and clustering of genes with ANOVA and Gaussian Mixture Model (GMM), we identified 48 unique metagenes. These metagenes achieved 87% accuracy in predicting the established subtypes. SHAP analysis revealed the most predictive metagenes per subtype, while functional enrichment analysis identified their associated pathways. Genes were ranked by differential expression between high- and low-expression groups. The metagenes revealed insights, including co-expression of immune activation and regulatory factors, links between cell cycle regulation and immune evasion, and dynamic microenvironment remodeling signatures. Kaplan-Meier survival analysis and multivariate Cox Regression revealed that many metagenes had prognostic value for overall survival. Overall, the metagenes represent coordinated biological programs across diverse cancer types, providing a foundation for developing robust, broadly applicable immuno-oncology biomarkers that extend beyond single-gene markers. They demonstrate prognostic value across cancer types and hold potential to guide immunotherapy treatment decisions.

Paper Structure

This paper contains 46 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Gene selection and clustering analysis. (a) t-SNE visualization of ANOVA-selected genes, and (b) t-SNE visualization using XGBoost-selected features.
  • Figure 2: Normalized confusion matrix of predicted versus true subtype labels for the XGBoost model.
  • Figure 3: Panels a--f (corresponding to subtypes C1--C6) show SHAP value distributions for the top 20 most predictive metagene-derived features identified for each subtype. Each violin represents the contribution of a feature to the model's prediction, with wider sections indicating higher sample density. Positive and negative SHAP values reflect feature influence toward or against classification of the corresponding subtype.
  • Figure 4: Kaplan--Meier plots showing overall survival differences between high (above-median) and low (below-median) expression groups for three distinct metagene-derived features: (a) Metagene 110 (mean), (b) Metagene 81 (mean), and (c) Metagene 101 (mean).