EXAONEPath 1.0 Patch-level Foundation Model for Pathology
Juseung Yun, Yi Hu, Jinhyung Kim, Jongseong Jang, Soonyoung Lee
TL;DR
The paper identifies a WSI-specific feature collapse in self-supervised patch-based learning for digital pathology and mitigates it by introducing EXAONEPath, a patch-level foundation model trained on Macenko-normalized patches using DINO. This approach yields more generalized, color-robust features and achieves competitive performance across six patch-level downstream tasks with fewer WSIs and smaller model size. The study demonstrates substantial improvements in learning efficiency and generalization when stain normalization is incorporated into pretraining, while also acknowledging residual collapse that invites further research. Overall, EXAONEPath advances efficient, generalizable pathology analysis by integrating stain normalization into foundation-model pretraining.
Abstract
Recent advancements in digital pathology have led to the development of numerous foundational models that utilize self-supervised learning on patches extracted from gigapixel whole slide images (WSIs). While this approach leverages vast amounts of unlabeled data, we have discovered a significant issue: features extracted from these self-supervised models tend to cluster by individual WSIs, a phenomenon we term WSI-specific feature collapse. This problem can potentially limit the model's generalization ability and performance on various downstream tasks. To address this issue, we introduce EXAONEPath, a novel foundational model trained on patches that have undergone stain normalization. Stain normalization helps reduce color variability arising from different laboratories and scanners, enabling the model to learn more consistent features. EXAONEPath is trained using 285,153,903 patches extracted from a total of 34,795 WSIs. Our experiments demonstrate that EXAONEPath significantly mitigates the feature collapse problem, indicating that the model has learned more generalized features rather than overfitting to individual WSI characteristics. We compared EXAONEPath with state-of-the-art models across six downstream task datasets, and our results show that EXAONEPath achieves superior performance relative to the number of WSIs used and the model's parameter count. This suggests that the application of stain normalization has substantially improved the model's efficiency and generalization capabilities.
