Table of Contents
Fetching ...

Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification

Mohammad Shiri, Monalika Padma Reddy, Jiangwen Sun

TL;DR

IDC classification from histopathology is challenged by limited labeled data and the need for robust generalization. The authors introduce SupCon-ViT, a two-stage framework that fine-tunes a pre-trained Vision Transformer using supervised contrastive loss to learn discriminative embeddings, followed by training a linear classifier on frozen features. On a large patch dataset, SupCon-ViT achieves state-of-the-art metrics (e.g., $F1=0.8188$, $precision=0.7692$, $specificity=0.8971$, $balanced\ accuracy=0.8861$) and demonstrates clear embedding separability and effective WSI localization. This work suggests that pairing supervised contrastive learning with pretrained transformers is a viable path for accurate, data-efficient histopathology-based IDC diagnosis with potential clinical impact.

Abstract

Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer. Breast tissue histopathological examination is critical in diagnosing and classifying breast cancer. Although existing methods have shown promising results, there is still room for improvement in the classification accuracy and generalization of IDC using histopathology images. We present a novel approach, Supervised Contrastive Vision Transformer (SupCon-ViT), for improving the classification of invasive ductal carcinoma in terms of accuracy and generalization by leveraging the inherent strengths and advantages of both transfer learning, i.e., pre-trained vision transformer, and supervised contrastive learning. Our results on a benchmark breast cancer dataset demonstrate that SupCon-Vit achieves state-of-the-art performance in IDC classification, with an F1-score of 0.8188, precision of 0.7692, and specificity of 0.8971, outperforming existing methods. In addition, the proposed model demonstrates resilience in scenarios with minimal labeled data, making it highly efficient in real-world clinical settings where labelled data is limited. Our findings suggest that supervised contrastive learning in conjunction with pre-trained vision transformers appears to be a viable strategy for an accurate classification of IDC, thus paving the way for a more efficient and reliable diagnosis of breast cancer through histopathological image analysis.

Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification

TL;DR

IDC classification from histopathology is challenged by limited labeled data and the need for robust generalization. The authors introduce SupCon-ViT, a two-stage framework that fine-tunes a pre-trained Vision Transformer using supervised contrastive loss to learn discriminative embeddings, followed by training a linear classifier on frozen features. On a large patch dataset, SupCon-ViT achieves state-of-the-art metrics (e.g., , , , ) and demonstrates clear embedding separability and effective WSI localization. This work suggests that pairing supervised contrastive learning with pretrained transformers is a viable path for accurate, data-efficient histopathology-based IDC diagnosis with potential clinical impact.

Abstract

Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer. Breast tissue histopathological examination is critical in diagnosing and classifying breast cancer. Although existing methods have shown promising results, there is still room for improvement in the classification accuracy and generalization of IDC using histopathology images. We present a novel approach, Supervised Contrastive Vision Transformer (SupCon-ViT), for improving the classification of invasive ductal carcinoma in terms of accuracy and generalization by leveraging the inherent strengths and advantages of both transfer learning, i.e., pre-trained vision transformer, and supervised contrastive learning. Our results on a benchmark breast cancer dataset demonstrate that SupCon-Vit achieves state-of-the-art performance in IDC classification, with an F1-score of 0.8188, precision of 0.7692, and specificity of 0.8971, outperforming existing methods. In addition, the proposed model demonstrates resilience in scenarios with minimal labeled data, making it highly efficient in real-world clinical settings where labelled data is limited. Our findings suggest that supervised contrastive learning in conjunction with pre-trained vision transformers appears to be a viable strategy for an accurate classification of IDC, thus paving the way for a more efficient and reliable diagnosis of breast cancer through histopathological image analysis.
Paper Structure (13 sections, 1 equation, 7 figures, 4 tables)

This paper contains 13 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The dataset comprises a total of 277,524 patches, each of 50x50 pixels, extracted from WSI. (a) A sample of images from a set of images that are benign, i.e., having a non-IDC (negative) diagnosis. (b) A sample of images from a set of images that are Malignant, i.e. have an IDC (positive) diagnosis.
  • Figure 2: SupCon-ViT is trained in two stages - Stage 1 and Stage 2. Stage 1 comprises the data augmentation layer, the encoder(ViT) layer, and the projection layer. The supervised contrastive loss is calculated in Stage 1. Stage 2 has a classification layer which uses the cross-entropy loss and the frozen representations from Stage 1.
  • Figure 3: PCA visualization of the feature representations of ViT on train set
  • Figure 4: PCA visualization of the feature representations of SupCon-ViT on train set
  • Figure 5: Confusion Matrix of the SupCon-ViT model on the test set
  • ...and 2 more figures