ActiveSSF: An Active-Learning-Guided Self-Supervised Framework for Long-Tailed Megakaryocyte Classification
Linghao Zhuang, Ying Zhang, Gege Yuan, Xingyue Zhao, Zhiping Jiang
TL;DR
ActiveSSF tackles megakaryocyte classification under background noise, long-tail subtype distributions, and morphological variability by coupling clinical-prior cell-region filtering with adaptive, prototype-guided sample selection in self-supervised pretraining. The two-stage pipeline extracts informative cellular regions and builds robust prototypes from labeled data to steer unlabeled data selection, using dynamic density-aware thresholds to emphasize rare subtypes. Across eleven megakaryocyte subtypes on a clinical dataset, ActiveSSF yields state-of-the-art results and substantial gains for rare classes, demonstrating improved diagnostic potential for myelodysplastic syndrome. The framework's integration of region filtering, prototype clustering, and adaptive sampling offers a practical path toward scalable, accurate automated blood-cell analysis in clinical settings.
Abstract
Precise classification of megakaryocytes is crucial for diagnosing myelodysplastic syndromes. Although self-supervised learning has shown promise in medical image analysis, its application to classifying megakaryocytes in stained slides faces three main challenges: (1) pervasive background noise that obscures cellular details, (2) a long-tailed distribution that limits data for rare subtypes, and (3) complex morphological variations leading to high intra-class variability. To address these issues, we propose the ActiveSSF framework, which integrates active learning with self-supervised pretraining. Specifically, our approach employs Gaussian filtering combined with K-means clustering and HSV analysis (augmented by clinical prior knowledge) for accurate region-of-interest extraction; an adaptive sample selection mechanism that dynamically adjusts similarity thresholds to mitigate class imbalance; and prototype clustering on labeled samples to overcome morphological complexity. Experimental results on clinical megakaryocyte datasets demonstrate that ActiveSSF not only achieves state-of-the-art performance but also significantly improves recognition accuracy for rare subtypes. Moreover, the integration of these advanced techniques further underscores the practical potential of ActiveSSF in clinical settings.
