Table of Contents
Fetching ...

CytoDINO: Risk-Aware and Biologically-Informed Adaptation of DINOv3 for Bone Marrow Cytomorphology

Aziz Muminov, Anne Pham

TL;DR

CytoDINO tackles the diagnostic challenges of bone marrow cytomorphology by fine-tuning a large foundation model (DINOv3) with a parameter-efficient LoRA approach and a Hierarchical Focal Loss that encodes biological lineage information and prioritizes clinically dangerous errors. The method achieves state-of-the-art performance on the MLL BMC dataset with 88.2% weighted F1 and demonstrates practical viability through 8% trainable parameters on consumer hardware and a high-accuracy selective-prediction workflow. Key contributions include biologically-informed label smoothing, a critical error penalty matrix, and a decoder head that captures distributed morphological features, enabling robust, lineage-aware embeddings and strong zero-shot transfer. These properties support scalable clinical deployment with a human-in-the-loop for uncertain cases and potential integration into patient-level diagnostic pipelines.

Abstract

Bone marrow cell cytomorphology analysis is critical for the diagnosis of hematological malignancies but remains a labor-intensive process subject to significant inter-observer variability. While recent foundation models have shown promise in computational pathology, they often require extensive computational resources and fail to account for the asymmetric risks associated with clinical misdiagnosis. We introduce CytoDINO, a framework that achieves state-of-the-art performance on the Munich Leukemia Laboratory (MLL) dataset by fine-tuning DINOv3 using Low-Rank Adaptation (LoRA). Our primary contribution is a novel Hierarchical Focal Loss with Critical Penalties, which encodes biological relationships between cell lineages and explicitly penalizes clinically dangerous misclassifications (e.g., classifying blasts as normal cells). CytoDINO achieves an 88.2% weighted F1 score and 76.5% macro F1 on a held-out test set of 21 cell classes. By utilizing parameter-efficient fine-tuning with only 8% trainable parameters on a single NVIDIA RTX 5080, we demonstrate that consumer-grade hardware can match specialized infrastructure. Furthermore, confidence-based selective prediction yields 99.5% accuracy on 67% of samples, suggesting a viable pathway for clinical deployment where high-uncertainty cases are flagged for expert review

CytoDINO: Risk-Aware and Biologically-Informed Adaptation of DINOv3 for Bone Marrow Cytomorphology

TL;DR

CytoDINO tackles the diagnostic challenges of bone marrow cytomorphology by fine-tuning a large foundation model (DINOv3) with a parameter-efficient LoRA approach and a Hierarchical Focal Loss that encodes biological lineage information and prioritizes clinically dangerous errors. The method achieves state-of-the-art performance on the MLL BMC dataset with 88.2% weighted F1 and demonstrates practical viability through 8% trainable parameters on consumer hardware and a high-accuracy selective-prediction workflow. Key contributions include biologically-informed label smoothing, a critical error penalty matrix, and a decoder head that captures distributed morphological features, enabling robust, lineage-aware embeddings and strong zero-shot transfer. These properties support scalable clinical deployment with a human-in-the-loop for uncertain cases and potential integration into patient-level diagnostic pipelines.

Abstract

Bone marrow cell cytomorphology analysis is critical for the diagnosis of hematological malignancies but remains a labor-intensive process subject to significant inter-observer variability. While recent foundation models have shown promise in computational pathology, they often require extensive computational resources and fail to account for the asymmetric risks associated with clinical misdiagnosis. We introduce CytoDINO, a framework that achieves state-of-the-art performance on the Munich Leukemia Laboratory (MLL) dataset by fine-tuning DINOv3 using Low-Rank Adaptation (LoRA). Our primary contribution is a novel Hierarchical Focal Loss with Critical Penalties, which encodes biological relationships between cell lineages and explicitly penalizes clinically dangerous misclassifications (e.g., classifying blasts as normal cells). CytoDINO achieves an 88.2% weighted F1 score and 76.5% macro F1 on a held-out test set of 21 cell classes. By utilizing parameter-efficient fine-tuning with only 8% trainable parameters on a single NVIDIA RTX 5080, we demonstrate that consumer-grade hardware can match specialized infrastructure. Furthermore, confidence-based selective prediction yields 99.5% accuracy on 67% of samples, suggesting a viable pathway for clinical deployment where high-uncertainty cases are flagged for expert review

Paper Structure

This paper contains 29 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Representative bone marrow cells from six hematopoietic lineages showing distinct morphological characteristics. May-Grünwald-Giemsa staining, 40$\times$ magnification. Each lineage displays unique nuclear and cytoplasmic features that inform our hierarchical classification approach.
  • Figure 2: Augmentation pipeline applied during training. Representative transformations include geometric operations (flips, rotations), photometric perturbations (color jitter, brightness/contrast adjustments), and noise injection (Gaussian blur, ISO noise). The rightmost panel shows the final normalized image using dataset-specific statistics ($\mu = [0.5631, 0.4959, 0.7355]$, $\sigma = [0.2419, 0.2835, 0.1761]$) computed from the entire MLL dataset. These transformations simulate inter-laboratory variability in staining protocols and imaging equipment while maintaining biological plausibility.
  • Figure 3: TSNE visualization of learned embeddings from the Transformer Decoder Head. Left: Embeddings colored by cell class (21 classes) show clear separation with minimal overlap. Right: Embeddings colored by lineage reveal hierarchical organization, with the granulocytic lineage (blue) forming a continuous trajectory from immature blasts to mature neutrophils. The model learns biologically meaningful representations that cluster related cell types while maintaining discriminative power for fine-grained classification.