CytoDINO: Risk-Aware and Biologically-Informed Adaptation of DINOv3 for Bone Marrow Cytomorphology
Aziz Muminov, Anne Pham
TL;DR
CytoDINO tackles the diagnostic challenges of bone marrow cytomorphology by fine-tuning a large foundation model (DINOv3) with a parameter-efficient LoRA approach and a Hierarchical Focal Loss that encodes biological lineage information and prioritizes clinically dangerous errors. The method achieves state-of-the-art performance on the MLL BMC dataset with 88.2% weighted F1 and demonstrates practical viability through 8% trainable parameters on consumer hardware and a high-accuracy selective-prediction workflow. Key contributions include biologically-informed label smoothing, a critical error penalty matrix, and a decoder head that captures distributed morphological features, enabling robust, lineage-aware embeddings and strong zero-shot transfer. These properties support scalable clinical deployment with a human-in-the-loop for uncertain cases and potential integration into patient-level diagnostic pipelines.
Abstract
Bone marrow cell cytomorphology analysis is critical for the diagnosis of hematological malignancies but remains a labor-intensive process subject to significant inter-observer variability. While recent foundation models have shown promise in computational pathology, they often require extensive computational resources and fail to account for the asymmetric risks associated with clinical misdiagnosis. We introduce CytoDINO, a framework that achieves state-of-the-art performance on the Munich Leukemia Laboratory (MLL) dataset by fine-tuning DINOv3 using Low-Rank Adaptation (LoRA). Our primary contribution is a novel Hierarchical Focal Loss with Critical Penalties, which encodes biological relationships between cell lineages and explicitly penalizes clinically dangerous misclassifications (e.g., classifying blasts as normal cells). CytoDINO achieves an 88.2% weighted F1 score and 76.5% macro F1 on a held-out test set of 21 cell classes. By utilizing parameter-efficient fine-tuning with only 8% trainable parameters on a single NVIDIA RTX 5080, we demonstrate that consumer-grade hardware can match specialized infrastructure. Furthermore, confidence-based selective prediction yields 99.5% accuracy on 67% of samples, suggesting a viable pathway for clinical deployment where high-uncertainty cases are flagged for expert review
