Knowledge-enhanced Pretraining for Vision-language Pathology Foundation Model on Cancer Diagnosis
Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ge Wang, Ruifen Wang, Lifeng Wang, Xiaojun Yuan, Xin Sun, Ya Zhang, Kun Sun, Yanfeng Wang, Weidi Xie
TL;DR
KEEP (KnowledgE-Enhanced Pathology), a foundation model that systematically incorporates disease knowledge into pretraining for cancer diagnosis, is introduced, establishing knowledge-enhanced vision-language modeling as a powerful paradigm for advancing computational pathology.
Abstract
Vision-language foundation models have shown great promise in computational pathology but remain primarily data-driven, lacking explicit integration of medical knowledge. We introduce KEEP (KnowledgE-Enhanced Pathology), a foundation model that systematically incorporates disease knowledge into pretraining for cancer diagnosis. KEEP leverages a comprehensive disease knowledge graph encompassing 11,454 diseases and 139,143 attributes to reorganize millions of pathology image-text pairs into 143,000 semantically structured groups aligned with disease ontology hierarchies. This knowledge-enhanced pretraining aligns visual and textual representations within hierarchical semantic spaces, enabling deeper understanding of disease relationships and morphological patterns. Across 18 public benchmarks (over 14,000 whole-slide images) and 4 institutional rare cancer datasets (926 cases), KEEP consistently outperformed existing foundation models, showing substantial gains for rare subtypes. These results establish knowledge-enhanced vision-language modeling as a powerful paradigm for advancing computational pathology.
