All Centers Are at most a Few Tokens Apart: Knowledge Distillation with Domain Invariant Prompt Tuning
Amir Mohammad Ezzati, Alireza Malekhosseini, Armin Khosravi, Mohammad Hossein Rohban
TL;DR
Domain generalization remains a key bottleneck in histopathology due to slide staining and scanner variability. The paper introduces Domain Invariant Prompt Tuning (DIPT), which learns domain-specific tokens and aggregates them into domain-invariant class embeddings to distill from PLIP's text encoder. This dual approach—domain-aware prompt learning followed by class-generic prompt-based knowledge distillation—yields more robust cross-domain representations. Across CAMELYON17-WILDS and Kather19, DIPT substantially improves generalization metrics over zero-shot PLIP and prior baselines, indicating strong potential for clinical deployment in heterogeneous data environments.
Abstract
Domain generalization is critical in computational pathology (CPath) due to inherent domain shifts caused by variations in staining protocols, scanner devices, and imaging settings across clinical centers. Vision-language models (VLMs), such as PLIP-a pathology-tuned CLIP-trained on image-text pairs across diverse domains, serve as strong knowledge distillation sources. However, their zero-shot performance with predefined prompts remains limited due to sensitivity to prompt variations. Moreover, unlike natural images, histopathology centers lack semantic descriptors (e.g., 'sketch'), making it difficult to define domain-specific prompts for clinical centers. This requires a data-driven approach for learning domain-specific and ultimately class-generic continuous prompts. We propose Domain Invariant Prompt Tuning (DIPT) for knowledge distillation process, a novel step that learns multiple input tokens for each domain. These tokens are trained separately for each domain and are averaged across domains, leading to domain-invariant prompts. Our student model then distills knowledge from PLIP's text encoder by leveraging the prompts learned by DIPT. This leads to alignment of visual features with domain-invariant embeddings, enhancing generalization by training on multiple domains. Our method adds a significant improvement in average F1-score to existing state-of-the-art (SOTA) knowledge distillation approaches in domain generalization with histopathology datasets. This work helps the way of deploying robust CPath models in real-world clinical problems with heterogeneous data sources.
