Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Gexin Huang; Chenfei Wu; Mingjie Li; Xiaojun Chang; Ling Chen; Ying Sun; Shen Zhao; Xiaodan Liang; Liang Lin

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin

TL;DR

This work addresses the challenge of predicting genetic mutations from gigapixel WSIs by introducing BPGT, a Biological-knowledge enhanced PathGenomic multi-label Transformer. BPGT jointly leverages visual features, linguistic gene descriptions, and biomedical gene relations through a gene encoder (GG with KAM) and a label decoder (MFM with a comparative multi-label loss), enabling efficient multi-gene mutation prediction without training hundreds of binary classifiers. The approach achieves state-of-the-art performance on TCGA, demonstrating improved discrimination and robustness across genes and cancer types, and provides interpretable attention that aligns with tumorous regions and gene pathways. The work advances clinically relevant cancer genomics by showing how multi-modal knowledge can guide mutation prediction from histopathology images and offers a reproducible pipeline for future research.

Abstract

Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules: (a) A modality fusion module that firstly fuses the gene priors with critical regions in WSIs and obtains gene-wise mutation logits. (b) A comparative multi-label loss that emphasizes the inherent comparisons among mutation status to enhance the discrimination capabilities. Sufficient experiments on The Cancer Genome Atlas benchmark demonstrate that BPGT outperforms the state-of-the-art.

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

TL;DR

Abstract

Paper Structure (24 sections, 10 equations, 8 figures, 3 tables)

This paper contains 24 sections, 10 equations, 8 figures, 3 tables.

Introduction
Related work
Multi-instance learning (MIL) paradigm for genetic mutation prediction in WSIs.
Knowledge graph for medical images
Discussion and motivation
Overview of the proposed method
BPGT
Visual extractor
Gene encoder
Gene graph
Knowledge association module.
Label decoder
Modality fusion module
Comparative multi-label loss
Experiments
...and 9 more sections

Figures (8)

Figure 1: Comparison of the flowcharts of our BPGT with existing MIL frameworks. While MIL frameworks (Fig. \ref{['fig_1']} (a)) use visual features to independently predict 2D vectors for each gene indicating its mutation, our BPGT (Fig. \ref{['fig_1']} (b)) comprehensively associates knowledge from different sources (i.e., linguistic and biomedical knowledge) in a multi-label classification paradigm, which improves the efficiency, alleviates the class imbalance problem, fully leverages the potential concurrence of genetic mutations, and improves the feature discriminability.
Figure 2: Illustration of the overall architecture of BPGT, which includes: a) visual extractor; b) gene encoder; c) label decoder. Details of (b1) gene graph (GG) and (b2) Knowledge Association Module (KAM) are respectively illustrated in \ref{['mtd:main:geg']} and \ref{['mtd:main:kal']}.
Figure 3: Gene encoder (GE) is designed to aggregate linguistic and biomedical knowledge into gene priors. GE contains a gene graph (GG, Fig. \ref{['fig_3']} (a)) and a knowledge association module (KAM, Fig. \ref{['fig_3']} (b)). GG considers linguistic knowledge and biomedical knowledge encoding. Linguistic knowledge encoding (Fig. \ref{['fig_3']} (a1)) is firstly obtained from the GeneCard and encoded via byte pair encoding and the NLP Bert, which is utilized as the initial gene features. Biomedical knowledge encoding contains three encoding approaches: Phenotype encoding (Fig. \ref{['fig_3']} (a2)) encodes cancer types for leveraging the gene-cancer relationships to help predict gene mutation; pathway encoding (Fig. \ref{['fig_3']} (a3)) encodes the biomedical functions of different genes to consider their mutation relationship; consistency encoding (Fig. \ref{['fig_3']} (a4)) encodes the concurrent mutation frequency of different genes from the data-driven aspect. KAM designs transformer-based graph representation learning, which introduces the linguistic and phenotype encoding in node features, and the pathway and consistency encoding in edge weights; KAM thereby integrates the above four types of genetic knowledge into gene priors.
Figure 4: Label decoder (LD) is designed to integrate the gene priors $\mathbf{P}$ and the visual features $\mathbf{F}$, which enables the gene priors in $\mathbf{P}$ to guide the model for the multi-label classification. (a) The modality fusion module firstly leverages transformer decoder layers to integrate visual features with the gene priors to obtain embeddings $\mathbf{Q}^{(I)}_{LD}$ via a cross-attention mechanism. Then, the gene-wise projection independently maps each row of $\mathbf{Q}^{(I)}_{LD}$ to its corresponding gene prediction (logit) by multiplying it with a unique learnable column vector. (b) Multi-label loss is designed to enlarge the margin between the hardest positive and negative logits (red and blue circles with dot lines in Fig. \ref{['fig_11']} (b)) to increase the discrimination for positive and negative predictions, wherein the positive and negative prediction is determined based on logit value $>0$ or $\leq 0$.
Figure 5: Performance comparison of BOGT and SOTA methods. Results are shown by the mean AUC performances on the 5-fold cross-validation for different genes on different cancers. The bars of different colors in Fig. \ref{['fig:cancer']} (a)$\sim$(c) represent the mean AUC of different models.
...and 3 more figures

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

TL;DR

Abstract

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (8)