Table of Contents
Fetching ...

Dictionary-based Pathology Mining with Hard-instance-assisted Classifier Debiasing for Genetic Biomarker Prediction from WSIs

Ling Zhang, Boxiang Yun, Ting Jin, Qingli Li, Xinxing Li, Yan Wang

Abstract

Prediction of genetic biomarkers, e.g., microsatellite instability in colorectal cancer is crucial for clinical decision making. But, two primary challenges hamper accurate prediction: (1) It is difficult to construct a pathology-aware representation involving the complex interconnections among pathological components. (2) WSIs contain a large proportion of areas unrelated to genetic biomarkers, which make the model easily overfit simple but irrelative instances. We hereby propose a Dictionary-based hierarchical pathology mining with hard-instance-assisted classifier Debiasing framework to address these challenges, dubbed as D2Bio. Our first module, dictionary-based hierarchical pathology mining, is able to mine diverse and very fine-grained pathological contextual interaction without the limit to the distances between patches. The second module, hard-instance-assisted classfier debiasing, learns a debiased classifier via focusing on hard but task-related features, without any additional annotations. Experimental results on five cohorts show the superiority of our method, with over 4% improvement in AUROC compared with the second best on the TCGA-CRC-MSI cohort. Our analysis further shows the clinical interpretability of D2Bio in genetic biomarker diagnosis and potential clinical utility in survival analysis. Code will be available at https://github.com/DeepMed-Lab-ECNU/D2Bio.

Dictionary-based Pathology Mining with Hard-instance-assisted Classifier Debiasing for Genetic Biomarker Prediction from WSIs

Abstract

Prediction of genetic biomarkers, e.g., microsatellite instability in colorectal cancer is crucial for clinical decision making. But, two primary challenges hamper accurate prediction: (1) It is difficult to construct a pathology-aware representation involving the complex interconnections among pathological components. (2) WSIs contain a large proportion of areas unrelated to genetic biomarkers, which make the model easily overfit simple but irrelative instances. We hereby propose a Dictionary-based hierarchical pathology mining with hard-instance-assisted classifier Debiasing framework to address these challenges, dubbed as D2Bio. Our first module, dictionary-based hierarchical pathology mining, is able to mine diverse and very fine-grained pathological contextual interaction without the limit to the distances between patches. The second module, hard-instance-assisted classfier debiasing, learns a debiased classifier via focusing on hard but task-related features, without any additional annotations. Experimental results on five cohorts show the superiority of our method, with over 4% improvement in AUROC compared with the second best on the TCGA-CRC-MSI cohort. Our analysis further shows the clinical interpretability of D2Bio in genetic biomarker diagnosis and potential clinical utility in survival analysis. Code will be available at https://github.com/DeepMed-Lab-ECNU/D2Bio.

Paper Structure

This paper contains 18 sections, 15 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Motivation of our dictionary-based strategy. (a) Attention-based MIL methods: simply model the relationships between instances. (b) Graph-based methods: model instance relationships without construction of real pathological components. (c) Proposed dictionary-based method: utilizes a learnable dictionary to group instances into pathological components and hierarchically mine pathological contextual interaction.
  • Figure 2: Motivation of our hard-instance-assisted classifier debiasing strategy. Sufferring from redundant task-irrelevant instances, during inference, biased classifier misjudges the bag-level labels by overfitting simple instances. Learning hard instances which include task-specific instances can assist the classifier to reduce the bias.
  • Figure 3: Illustration of D$^2$Bio. The overall framework consists of two parts: 1) dictionary-based hierarchical pathology mining, 2) hard-instance assisted-classifier debiasing. Given a WSI $\mathbf{X}$, our D$^2$Bio first initialize a learnable dictionary to extract pathological information of $\mathbf{X}$ via cross attention operation. Then our D$^2$Bio groups instances into fine-grained pathological groups according to the similarity matrix. To hierarchically mine the interaction, Multi-head Self-attention (MSA) is first employed in each group. Features of $\mathbf{X}$ are updated by ungrouping these groups, which is further used to update the dictionary. After repeating the above steps $L$ times, inter-group ViT is employed. Finally, our D$^2$Bio assigns hard instance pseudo labels via unsupervised clustering to supervise the classfication head to reduce the bias in WSIs. The purple arrow indicates the classification branch and the orange arrow indicates the classifier debiasing branch.
  • Figure 4: Heatmap visualization of D$^2$Bio on the MSI prediction task and identified pathological patterns.
  • Figure 5: Pathological group distribution and corresponding pathological patterns on a WSI of MSI cancer.
  • ...and 8 more figures