Table of Contents
Fetching ...

UNICORN: A Deep Learning Model for Integrating Multi-Stain Data in Histopathology

Valentin Koch, Sabine Bauer, Valerio Luppberger, Michael Joner, Heribert Schunkert, Julia A. Schnabel, Moritz von Scheidt, Carsten Marr

TL;DR

A novel transformer model for multi-stain integration that can handle missing data during training as well as inference and effectively identifies relevant tissue phenotypes across stainings and implicitly models disease progression is proposed.

Abstract

Background: The integration of multi-stain histopathology images through deep learning poses a significant challenge in digital histopathology. Current multi-modal approaches struggle with data heterogeneity and missing data. This study aims to overcome these limitations by developing a novel transformer model for multi-stain integration that can handle missing data during training as well as inference. Methods: We propose UNICORN (UNiversal modality Integration Network for CORonary classificatioN) a multi-modal transformer capable of processing multi-stain histopathology for atherosclerosis severity class prediction. The architecture comprises a two-stage, end-to-end trainable model with specialized modules utilizing transformer self-attention blocks. The initial stage employs domain-specific expert modules to extract features from each modality. In the subsequent stage, an aggregation expert module integrates these features by learning the interactions between the different data modalities. Results: Evaluation was performed using a multi-class dataset of atherosclerotic lesions from the Munich Cardiovascular Studies Biobank (MISSION), using over 4,000 paired multi-stain whole slide images (WSIs) from 170 deceased individuals on 7 prespecified segments of the coronary tree, each stained according to four histopathological protocols. UNICORN achieved a classification accuracy of 0.67, outperforming other state-of-the-art models. The model effectively identifies relevant tissue phenotypes across stainings and implicitly models disease progression. Conclusion: Our proposed multi-modal transformer model addresses key challenges in medical data analysis, including data heterogeneity and missing modalities. Explainability and the model's effectiveness in predicting atherosclerosis progression underscores its potential for broader applications in medical research.

UNICORN: A Deep Learning Model for Integrating Multi-Stain Data in Histopathology

TL;DR

A novel transformer model for multi-stain integration that can handle missing data during training as well as inference and effectively identifies relevant tissue phenotypes across stainings and implicitly models disease progression is proposed.

Abstract

Background: The integration of multi-stain histopathology images through deep learning poses a significant challenge in digital histopathology. Current multi-modal approaches struggle with data heterogeneity and missing data. This study aims to overcome these limitations by developing a novel transformer model for multi-stain integration that can handle missing data during training as well as inference. Methods: We propose UNICORN (UNiversal modality Integration Network for CORonary classificatioN) a multi-modal transformer capable of processing multi-stain histopathology for atherosclerosis severity class prediction. The architecture comprises a two-stage, end-to-end trainable model with specialized modules utilizing transformer self-attention blocks. The initial stage employs domain-specific expert modules to extract features from each modality. In the subsequent stage, an aggregation expert module integrates these features by learning the interactions between the different data modalities. Results: Evaluation was performed using a multi-class dataset of atherosclerotic lesions from the Munich Cardiovascular Studies Biobank (MISSION), using over 4,000 paired multi-stain whole slide images (WSIs) from 170 deceased individuals on 7 prespecified segments of the coronary tree, each stained according to four histopathological protocols. UNICORN achieved a classification accuracy of 0.67, outperforming other state-of-the-art models. The model effectively identifies relevant tissue phenotypes across stainings and implicitly models disease progression. Conclusion: Our proposed multi-modal transformer model addresses key challenges in medical data analysis, including data heterogeneity and missing modalities. Explainability and the model's effectiveness in predicting atherosclerosis progression underscores its potential for broader applications in medical research.
Paper Structure (15 sections, 7 figures, 1 table)

This paper contains 15 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: MISSION biobank and UNICORN architecture. a) Coronary artery segments used in this study. 1+2: proximal and distal part of the right coronary artery, 3: main stem, 4+5: proximal and distal part of the left coronary artery, 6+7: proximal and distal part of the left circumflex coronary artery. b) Histological classification of coronary arteries according to an adapted and simplified AHA classification based on Virmani et al. Virmani2000-jr. H&E stains are shown for AIT, PIT, EFA, LFA and von Kossa silver stain is shown for CFA, all exemplary with zoomed in regions of interest. Blue and red arrows show class specific characteristics. c) Class distribution in the study cohort of MISSION with each n=7 segments of 170 individuals. d) UNICORN architecture: features extracted from WSIs with the four different stainings Hematoxylin and Eosin (H&E), Elastica van Gieson (EvG), von Kossa (vK) and Movat pentachrome (Movat) are forwarded to four expert models consisting of two transformer blocks (e) that are specialized in processing data from a certain staining. Similar to the class token (CLS), information from each domain is aggregated in a modality token (MT) which is used as input to the expert aggregation model, that combines information across stainings into the CLS token. The final classification score is derived from a fully connected (FC) layer that uses the CLS token as input. e) Transformer blocks: transformer blocks used in expert models and expert aggregation models are shown in detail. MLP: multi-layer perceptron f) Explainable output of UNICORN: using attention values, UNICORN can provide explainable output, highlighting regions of importance across stainings. IT: Intima thickening, NC: necrotic core
  • Figure 2: UNICORN integrates information from four stainings. a) Confusion matrix shows UNICORN classifying multi-stain atherosclerosis WSIs into five classes with high accuracy. Values are color coded from white (0) to dark blue (1). b) Performance of UNICORN using just one staining as input. The highest F1-Score is achieved when using all four stainings (grey dotted line), indicating that UNICORN successfully aggregates information of the different stainings. c) The attention mechanism of the UNICORN aggregation expert module functions well, with higher attention values noted for stainings exerting a high influence on performance for a given class. The mean attention value of the CLS token to the four modality tokens corresponding to a given staining is shown. d) F1-Score difference when utilizing three of four stainings vs. all stainings (gray dotted zero line), bar colors indicate which staining was excluded. Findings demonstrate robust correlation between two importance measurements (b, d) based on performance with the learnt attention scores (c) by UNICORN.
  • Figure 3: High resolution heatmaps highlight relevant regions for classification decisions. a) Original von Kossa silver stain image showing black calcification regions. b) Coloring indicates the presence of structures associated with the most severe disease classes (EFA=blue, LFA=yellow, CFA=red). c) Attention scores (lower score = blue, high score/high importance= red) show high attention regions. d) Multiplication of attention score (lower score = blue, high score/high importance = red) by the probability of the class that is predicted by UNICORN illustrates which region UNICORN considers important and also suspects the class it predicts to be in.
  • Figure 4: UMAP reveals disease progression modeling capabilities. UMAP of features of the final layer illustrates that the model learns the natural disease progression from AIT to CFA (a) implicitly. This finding is consistent when using only one staining as an input (b), and the model is able to distinguish which staining it is processing (c).
  • Figure 5: UNICORN highlights stain specific classification relevant regions. Highest class attention regions are shown in red across the four different stainings (H&E, EvG, vK, Movat) for a calcified fibroatheroma (CFA) case. The enlarged regions show highest class attention regions. For H&E and EvG the most relevant structures are highlighted with blue arrows.
  • ...and 2 more figures