CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, Iyyakutti Iyappan Ganapathi, Fayaz Ali Dharejo, Naoufel Werghi, Mohammed Bennamoun
TL;DR
CPLIP addresses the challenge of zero-shot learning in histopathology by enabling unpaired, many-to-many alignment between images and text. It constructs a comprehensive pathology prompt dictionary and builds textual and visual concept bags using MI-Zero, GPT-3, and PLIP, followed by MIL-NCE training to align multiple interrelated concepts. Across tile-level, WSI-level, and segmentation tasks on nine public datasets, CPLIP yields state-of-the-art zero-shot performance compared to existing vision–language approaches, while offering robust interpretability and transferability. The method demonstrates the value of enriched textual prompts and diverse visual content for pathology VL models and is complemented by extensive ablations and supplementary materials to promote reproducibility.
Abstract
This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://cplip.github.io/
