Table of Contents
Fetching ...

Efficient Fine-Tuning of DINOv3 Pretrained on Natural Images for Atypical Mitotic Figure Classification (MIDOG 2025 Task 2 Winner)

Guillaume Balezo, Hana Feki, Raphaël Bourgade, Lily Monnier, Matthieu Blons, Alice Blondel, Etienne Decencière, Albert Pla Planas, Thomas Walter

TL;DR

This work addresses AMF classification across diverse histopathology domains by fine-tuning a strong, generalist DINOv3-H+ vision transformer with LoRA to keep training efficient (~1.3M parameters). It integrates extensive domain-aware data augmentation and a Domain-Weighted Focal Loss to combat cross-domain variability and class imbalance, achieving state-of-the-art performance on the MIDOG 2025 Task 2 final test. Key findings include superior robustness across domains and the value of leveraging natural-image pretraining for biomedical tasks, with evidence that SSL-based mitosis pretraining can further boost representations. The study also notes practical considerations for large models and suggests future work in histology-focused SSL, distillation for efficiency, and closer integration with Task 1 detection pipelines.

Abstract

Atypical mitotic figures (AMFs) represent abnormal cell division associated with poor prognosis. Yet their detection remains difficult due to low prevalence, subtle morphology, and inter-observer variability. The MIDOG 2025 challenge introduces a benchmark for AMF classification across multiple domains. In this work, we fine-tuned the recently published DINOv3-H+ vision transformer, pretrained on natural images, using low-rank adaptation (LoRA), training only ~1.3M parameters in combination with extensive augmentation and a domain-weighted Focal Loss to handle domain heterogeneity. Despite the domain gap, our fine-tuned DINOv3 transfers effectively to histopathology, reaching first place on the final test set. These results highlight the advantages of DINOv3 pretraining and underline the efficiency and robustness of our fine-tuning strategy, yielding state-of-the-art results for the atypical mitosis classification challenge in MIDOG 2025.

Efficient Fine-Tuning of DINOv3 Pretrained on Natural Images for Atypical Mitotic Figure Classification (MIDOG 2025 Task 2 Winner)

TL;DR

This work addresses AMF classification across diverse histopathology domains by fine-tuning a strong, generalist DINOv3-H+ vision transformer with LoRA to keep training efficient (~1.3M parameters). It integrates extensive domain-aware data augmentation and a Domain-Weighted Focal Loss to combat cross-domain variability and class imbalance, achieving state-of-the-art performance on the MIDOG 2025 Task 2 final test. Key findings include superior robustness across domains and the value of leveraging natural-image pretraining for biomedical tasks, with evidence that SSL-based mitosis pretraining can further boost representations. The study also notes practical considerations for large models and suggests future work in histology-focused SSL, distillation for efficiency, and closer integration with Task 1 detection pipelines.

Abstract

Atypical mitotic figures (AMFs) represent abnormal cell division associated with poor prognosis. Yet their detection remains difficult due to low prevalence, subtle morphology, and inter-observer variability. The MIDOG 2025 challenge introduces a benchmark for AMF classification across multiple domains. In this work, we fine-tuned the recently published DINOv3-H+ vision transformer, pretrained on natural images, using low-rank adaptation (LoRA), training only ~1.3M parameters in combination with extensive augmentation and a domain-weighted Focal Loss to handle domain heterogeneity. Despite the domain gap, our fine-tuned DINOv3 transfers effectively to histopathology, reaching first place on the final test set. These results highlight the advantages of DINOv3 pretraining and underline the efficiency and robustness of our fine-tuning strategy, yielding state-of-the-art results for the atypical mitosis classification challenge in MIDOG 2025.

Paper Structure

This paper contains 6 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overview of our method during training: Input images are augmented (multi-Macenko, small translations, shear, coarse dropout, rotations, etc.) and normalized with ImageNet statistics. The classifier is a DINOv3-H+ pretrained on the LVD-1689M natural image dataset, fine-tuned with LoRA (rank 8, $\alpha=16$, $\sim$1.3M trainable parameters) and followed by a linear head on the class token with sigmoid activation to output probabilities. Optimization is performed with a Domain-Weighted Focal Loss, which combines Focal Loss for class imbalance with domain reweighting to address dataset heterogeneity.