Self-Supervised Multiple Instance Learning for Acute Myeloid Leukemia Classification
Salome Kazeminia, Max Joosten, Dragan Bosnacki, Carsten Marr
TL;DR
This work tackles AML genetic subtype classification from blood smears under weak labeling by pre-training an MIL encoder with self-supervised learning (SSL) methods—SimCLR, SwAV, and DINO—and integrating it into an attention-based MIL classifier. The same unlabeled dataset drives both SSL pre-training and MIL training, enabling data-efficient, cost-effective analysis without extensive single-cell annotations. Results show SSL-pretrained encoders achieve performance comparable to fully supervised pretraining, with SimCLR often delivering the best classification and attention-based interpretability toward malignant cells. The approach demonstrates the practical potential of SSL in MIL for medical image analysis, advancing data-efficient, explainable AI for AML diagnosis.
Abstract
Automated disease diagnosis using medical image analysis relies on deep learning, often requiring large labeled datasets for supervised model training. Diseases like Acute Myeloid Leukemia (AML) pose challenges due to scarce and costly annotations on a single-cell level. Multiple Instance Learning (MIL) addresses weakly labeled scenarios but necessitates powerful encoders typically trained with labeled data. In this study, we explore Self-Supervised Learning (SSL) as a pre-training approach for MIL-based AML subtype classification from blood smears, removing the need for labeled data during encoder training. We investigate the three state-of-the-art SSL methods SimCLR, SwAV, and DINO, and compare their performance against supervised pre-training. Our findings show that SSL-pretrained encoders achieve comparable performance, showcasing the potential of SSL in MIL. This breakthrough offers a cost-effective and data-efficient solution, propelling the field of AI-based disease diagnosis.
