Domain-Adversarial Neural Network and Explainable AI for Reducing Tissue-of-Origin Signal in Pan-cancer Mortality Classification
Cristian Padron-Manrique, Juan José Oropeza Valdez, Osbaldo Resendis-Antonio
TL;DR
The paper tackles the challenge that tissue-of-origin signals dominate pan-cancer survival analyses, hindering the discovery of universal mortality biomarkers. It introduces a Domain-Adversarial Neural Network (DANN) trained on TCGA RNA-seq data to learn tissue-invariant representations focused on mortality, complemented by layer-aware SHAP analyses and SHAP-guided clustering to reveal pan-cancer survival subpopulations. Results show that DANN reduces tissue bias in representations, though vanilla input-space explainability remains tissue-dominated; however, SHAP-based explanations at hidden layers uncover clearer survival-relevant structure and enable identification of five prognostic gene clusters across cancers. The approach demonstrates the value of combining domain adaptation with layer-wise interpretability to isolate mortality signals from tissue noise, enabling more robust pan-cancer biomarker discovery and interpretable patient stratification. Collectively, this framework advances generalizable survival predictions across tumor types and provides a roadmap for layer-aware XAI in high-dimensional, multi-domain biomedical data.
Abstract
Tissue-of-origin signals dominate pan-cancer gene expression, often obscuring molecular features linked to patient survival. This hampers the discovery of generalizable biomarkers, as models tend to overfit tissue-specific patterns rather than capture survival-relevant signals. To address this, we propose a Domain-Adversarial Neural Network (DANN) trained on TCGA RNA-seq data to learn representations less biased by tissue and more focused on survival. Identifying tissue-independent genetic profiles is key to revealing core cancer programs. We assess the DANN using: (1) Standard SHAP, based on the original input space and DANN's mortality classifier; (2) A layer-aware strategy applied to hidden activations, including an unsupervised manifold from raw activations and a supervised manifold from mortality-specific SHAP values. Standard SHAP remains confounded by tissue signals due to biases inherent in its computation. The raw activation manifold was dominated by high-magnitude activations, which masked subtle tissue and mortality-related signals. In contrast, the layer-aware SHAP manifold offers improved low-dimensional representations of both tissue and mortality signals, independent of activation strength, enabling subpopulation stratification and pan-cancer identification of survival-associated genes.
