Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification
Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin
TL;DR
This work addresses the challenge of predicting genetic mutations from gigapixel WSIs by introducing BPGT, a Biological-knowledge enhanced PathGenomic multi-label Transformer. BPGT jointly leverages visual features, linguistic gene descriptions, and biomedical gene relations through a gene encoder (GG with KAM) and a label decoder (MFM with a comparative multi-label loss), enabling efficient multi-gene mutation prediction without training hundreds of binary classifiers. The approach achieves state-of-the-art performance on TCGA, demonstrating improved discrimination and robustness across genes and cancer types, and provides interpretable attention that aligns with tumorous regions and gene pathways. The work advances clinically relevant cancer genomics by showing how multi-modal knowledge can guide mutation prediction from histopathology images and offers a reproducible pipeline for future research.
Abstract
Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules: (a) A modality fusion module that firstly fuses the gene priors with critical regions in WSIs and obtains gene-wise mutation logits. (b) A comparative multi-label loss that emphasizes the inherent comparisons among mutation status to enhance the discrimination capabilities. Sufficient experiments on The Cancer Genome Atlas benchmark demonstrate that BPGT outperforms the state-of-the-art.
