Table of Contents
Fetching ...

Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings

Bodong Zhang, Xiwen Li, Hamid Manoochehri, Xiaoya Tang, Deepika Sirohi, Beatrice S. Knudsen, Tolga Tasdizen

TL;DR

WeakSupCon addresses the shortage of patch-level labels in histopathology by introducing a weakly supervised contrastive learning framework that exploits bag-level labels during encoder pretraining. It splits patch features into negative and positive bag groups and optimizes two losses on a shared encoder: a Similarity Loss for negative patches and a SimCLR Loss for patches from positive bags, combining them into a single objective. Across Camelyon16, RVT, and kidney metastasis datasets, WeakSupCon-pretrained encoders yield superior downstream MIL performance compared with self-supervised and supervised baselines and, in several cases, outperform state-of-the-art histopathology foundation models. Feature-space analyses reveal clearer separation between negative and positive patches and greater diversity among positive patches, explaining the improved MIL attention and accuracy. The approach enhances robustness to domain shift and reduces reliance on dense patch-level annotation, with code available for community use.

Abstract

Digital histopathology whole slide images (WSIs) provide gigapixel-scale high-resolution images that are highly useful for disease diagnosis. However, digital histopathology image analysis faces significant challenges due to the limited training labels, since manually annotating specific regions or small patches cropped from large WSIs requires substantial time and effort. Weakly supervised multiple instance learning (MIL) offers a practical and efficient solution by requiring only bag-level (slide-level) labels, while each bag typically contains multiple instances (patches). Most MIL methods directly use frozen image patch features generated by various image encoders as inputs and primarily focus on feature aggregation. However, feature representation learning for encoder pretraining in MIL settings has largely been neglected. In our work, we propose a novel feature representation learning framework called weakly supervised contrastive learning (WeakSupCon) that incorporates bag-level label information during training. Our method does not rely on instance-level pseudo-labeling, yet it effectively separates patches with different labels in the feature space. Experimental results demonstrate that the image features generated by our WeakSupCon method lead to improved downstream MIL performance compared to self-supervised contrastive learning approaches in three datasets. Our related code is available at github.com/BzhangURU/Paper_WeakSupCon_for_MIL

Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings

TL;DR

WeakSupCon addresses the shortage of patch-level labels in histopathology by introducing a weakly supervised contrastive learning framework that exploits bag-level labels during encoder pretraining. It splits patch features into negative and positive bag groups and optimizes two losses on a shared encoder: a Similarity Loss for negative patches and a SimCLR Loss for patches from positive bags, combining them into a single objective. Across Camelyon16, RVT, and kidney metastasis datasets, WeakSupCon-pretrained encoders yield superior downstream MIL performance compared with self-supervised and supervised baselines and, in several cases, outperform state-of-the-art histopathology foundation models. Feature-space analyses reveal clearer separation between negative and positive patches and greater diversity among positive patches, explaining the improved MIL attention and accuracy. The approach enhances robustness to domain shift and reduces reliance on dense patch-level annotation, with code available for community use.

Abstract

Digital histopathology whole slide images (WSIs) provide gigapixel-scale high-resolution images that are highly useful for disease diagnosis. However, digital histopathology image analysis faces significant challenges due to the limited training labels, since manually annotating specific regions or small patches cropped from large WSIs requires substantial time and effort. Weakly supervised multiple instance learning (MIL) offers a practical and efficient solution by requiring only bag-level (slide-level) labels, while each bag typically contains multiple instances (patches). Most MIL methods directly use frozen image patch features generated by various image encoders as inputs and primarily focus on feature aggregation. However, feature representation learning for encoder pretraining in MIL settings has largely been neglected. In our work, we propose a novel feature representation learning framework called weakly supervised contrastive learning (WeakSupCon) that incorporates bag-level label information during training. Our method does not rely on instance-level pseudo-labeling, yet it effectively separates patches with different labels in the feature space. Experimental results demonstrate that the image features generated by our WeakSupCon method lead to improved downstream MIL performance compared to self-supervised contrastive learning approaches in three datasets. Our related code is available at github.com/BzhangURU/Paper_WeakSupCon_for_MIL
Paper Structure (19 sections, 6 equations, 5 figures, 3 tables)

This paper contains 19 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visualizations of theoretical feature distributions after (a) SimCLR and (b) SupCon training. Green dots indicate negative (i.e., no cancer) patches, while red dots indicate positive (i.e., cancer-containing) patches.
  • Figure 2: Schematic depiction of the main concept of WeakSupCon. The patch-level features generated by the encoder are split into two subsets for separate contrastive learning tasks based on their bag-level labels. Green dots are features from negative bags. Red dots are features from positive bags. In task 1, our proposed Similarity Loss is applied to patches from negative bags, which form a tight cluster. In task 2, the SimCLR Loss tries to maximize the distance between features from different original patches. Dots in the same blue box are from different augmentations of the same original patch. The bottom right part reveals an anticipated feature distribution of positive ('+') and negative ('-') patches (instances) when viewing in the same feature space.
  • Figure 3: PCA visualizations of features on (a) training set and (b) test set, red dots denote features from positive bags, green dots denote features from negative bags. The horizontal and vertical axes represent the first two dimensions in PCA. The features were downsampled for better visualization.
  • Figure 4: Histograms of cosine similarities between anchor (densest feature cluster center in negative slides) and randomly sampled patch features from (a) negative slides and (b) positive slides in the TRAINING set of Camelyon16 dataset generated by WeakSupCon encoder. The dark color in histograms shows the portion with cosine similarity greater than 0.999. The patch examples with red boundaries are positive patches diagnosed by our pathologists.
  • Figure 5: Histograms of cosine similarities between anchor (densest feature cluster center in negative slides) and randomly sampled patch features from (a) negative slides and (b) positive slides in the TEST set of Camelyon16 dataset generated by WeakSupCon encoder. The dark color in histograms shows the portion with cosine similarity greater than 0.999. The patch examples with red boundaries are positive patches diagnosed by our pathologists.