CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images

Hadar Hezi; Matan Gelber; Alexander Balabanov; Yosef E. Maruvka; Moti Freiman

CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images

Hadar Hezi, Matan Gelber, Alexander Balabanov, Yosef E. Maruvka, Moti Freiman

TL;DR

The CIMIL-CRC method holds promise for offering insights into the key representations of histopathological images and suggests a straightforward implementation, which is the best result achieved for MSI/MSS classification on this dataset.

Abstract

Treatment approaches for colorectal cancer (CRC) are highly dependent on the molecular subtype, as immunotherapy has shown efficacy in cases with microsatellite instability (MSI) but is ineffective for the microsatellite stable (MSS) subtype. There is promising potential in utilizing deep neural networks (DNNs) to automate the differentiation of CRC subtypes by analyzing Hematoxylin and Eosin (H\&E) stained whole-slide images (WSIs). Due to the extensive size of WSIs, Multiple Instance Learning (MIL) techniques are typically explored. However, existing MIL methods focus on identifying the most representative image patches for classification, which may result in the loss of critical information. Additionally, these methods often overlook clinically relevant information, like the tendency for MSI class tumors to predominantly occur on the proximal (right side) colon. We introduce `CIMIL-CRC', a DNN framework that: 1) solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches, and 2) integrates clinical priors, particularly the tumor location within the colon, into the model to enhance patient-level classification accuracy. We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort, contrasting it with a baseline patch-level classification, MIL-only approach, and Clinically-informed patch-level classification approach. Our CIMIL-CRC outperformed all methods (AUROC: $0.92\pm0.002$ (95\% CI 0.91-0.92), vs. $0.79\pm0.02$ (95\% CI 0.76-0.82), $0.86\pm0.01$ (95\% CI 0.85-0.88), and $0.87\pm0.01$ (95\% CI 0.86-0.88), respectively). The improvement was statistically significant.

CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images

TL;DR

Abstract

(95\% CI 0.91-0.92), vs.

(95\% CI 0.76-0.82),

(95\% CI 0.85-0.88), and

(95\% CI 0.86-0.88), respectively). The improvement was statistically significant.

Paper Structure (15 sections, 8 equations, 9 figures, 2 tables)

This paper contains 15 sections, 8 equations, 9 figures, 2 tables.

Introduction
Related work
Methods
Feature extraction
Embedded representation of the extracted feature matrices
Patient-level classification
Clinical Data Incorporation
Training settings
Experiments
Dataset
Experimental methodology
Classification models
Evaluation methodology
Results
Discussion and Conclusion

Figures (9)

Figure 1: Model architecture overview: (a) Patches from each patient are processed through a pre-trained Efficient-net, operating in evaluation mode, to perform patch classification. This yields extracted features of dimension [2560,7,7]. These feature sets for each patient are then arranged into matrices and stored. (Image of CNN by Haris Iqbal cnn_im). (b) PCA is subsequently applied to each patient’s feature matrix, retaining and saving the top-k eigenvectors. (c) These eigenvector matrices are input into an MLP for further feature extraction, resulting in the matrix $H \in \mathcal{R}^{kx512}$ (MLP illustration by Izaak Neutelings MLP_im). (d) Another MLP processes matrix $H$ to create the attention matrix $A \in \mathcal{R}^{kx3}$, followed by applying the $softmax$ function to its rows. The matrices $A$ and $H$ are then combined through multiplication. (e) The combined output is flattened and introduced into a classifier MLP to determine the patient's MSI score. This MSI score is subsequently adjusted by the side function (Equation \ref{['eq:side']}), generating the final patient-specific MSI score.
Figure 2: Data preprocessing chart. From the TCGA-CRC cohort (n=430), Kather et al. selected and pre-processed a subset (n=360) for publication. This subset was divided into a training set (n=260) and a testing set (n=100). Balancing was applied at the patch level within the training set by excluding MSS patches.
Figure 3: (a) Our external test set consists of n=100 patients, randomly selected by Kather et al. from their total cohort of n=360 Kather2019DeepCancer. The remaining n=260 patients were meticulously divided into 5 stratified folds to enable cross-validation, ensuring a balanced class distribution across these folds. We evaluated the efficacy of our methods using the AUROC, AP, F1-score, and Cohen's kappa scores, applied to each fold's model on the external test set. This evaluation was conducted for both the patch classification models and our MIL-based models (MIL-CRC, CIMIL-CRC). Additionally, for a comprehensive comparison, we incorporated the clinical information prior also to the patch-classification model (resulting in CI-Baseline). (b) The setup for using clinical side information only as the classification parameter. As this experiment is deterministic we applied it to the test set only (n=100) and reported Cohen's kappa and accuracy.
Figure 4: Baseline model architecture. Patches are fed into the Efficient-net b7 for feature extraction. The last two layers consist of fully connected classifier layers. The outputs then pass through a $softmax$ layer to generate probabilities. $N$ represents the number of patches for each patient, with $F(x_i)$ denoting the MSI probabilities of these patches. The MSI score for each patient, $P_w$, is calculated as the average of these MSI probabilities.
Figure 5: Validation sets metrics as a function of the number of the selected eigenvectors (EVs), averaged over the 5-folds for varying numbers of eigenvectors. (a) F1-score, (b) Cohen's kappa score.
...and 4 more figures

CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images

TL;DR

Abstract

CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images

Authors

TL;DR

Abstract

Table of Contents

Figures (9)