Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings
Bodong Zhang, Xiwen Li, Hamid Manoochehri, Xiaoya Tang, Deepika Sirohi, Beatrice S. Knudsen, Tolga Tasdizen
TL;DR
WeakSupCon addresses the shortage of patch-level labels in histopathology by introducing a weakly supervised contrastive learning framework that exploits bag-level labels during encoder pretraining. It splits patch features into negative and positive bag groups and optimizes two losses on a shared encoder: a Similarity Loss for negative patches and a SimCLR Loss for patches from positive bags, combining them into a single objective. Across Camelyon16, RVT, and kidney metastasis datasets, WeakSupCon-pretrained encoders yield superior downstream MIL performance compared with self-supervised and supervised baselines and, in several cases, outperform state-of-the-art histopathology foundation models. Feature-space analyses reveal clearer separation between negative and positive patches and greater diversity among positive patches, explaining the improved MIL attention and accuracy. The approach enhances robustness to domain shift and reduces reliance on dense patch-level annotation, with code available for community use.
Abstract
Digital histopathology whole slide images (WSIs) provide gigapixel-scale high-resolution images that are highly useful for disease diagnosis. However, digital histopathology image analysis faces significant challenges due to the limited training labels, since manually annotating specific regions or small patches cropped from large WSIs requires substantial time and effort. Weakly supervised multiple instance learning (MIL) offers a practical and efficient solution by requiring only bag-level (slide-level) labels, while each bag typically contains multiple instances (patches). Most MIL methods directly use frozen image patch features generated by various image encoders as inputs and primarily focus on feature aggregation. However, feature representation learning for encoder pretraining in MIL settings has largely been neglected. In our work, we propose a novel feature representation learning framework called weakly supervised contrastive learning (WeakSupCon) that incorporates bag-level label information during training. Our method does not rely on instance-level pseudo-labeling, yet it effectively separates patches with different labels in the feature space. Experimental results demonstrate that the image features generated by our WeakSupCon method lead to improved downstream MIL performance compared to self-supervised contrastive learning approaches in three datasets. Our related code is available at github.com/BzhangURU/Paper_WeakSupCon_for_MIL
