Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound

Edoardo Conti; Riccardo Rosati; Lorenzo Federici; Adriano Mancini; Maria Chiara Fiorentin

Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound

Edoardo Conti, Riccardo Rosati, Lorenzo Federici, Adriano Mancini, Maria Chiara Fiorentin

TL;DR

This work tackles the challenge of discriminating closely related fetal brain planes in ultrasound when inter-class variability is low. It introduces the FetalUS-188K multicenter benchmark and evaluates a DINOv3-based self-supervised pretraining approach, comparing domain-adaptive pretraining on fetal US data against initialization from natural-image weights for TT, TV, and TC plane classification ($TT$, $TV$, $TC$). The study shows that domain-specific pretraining yields substantial gains (up to $20\%$ in weighted F1) and preserves subtle echogenic cues necessary for fine-grained discrimination, whereas generic foundation models fail to generalize in this setting. Together, these findings imply that deploying foundation models in clinical fetal ultrasound requires domain-tailored pretraining and carefully designed transfer strategies to ensure robust, clinically reliable plane identification.

Abstract

Purpose: This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low inter-class variability conditions. While recent vision foundation models such as DINOv3 have shown remarkable transferability across medical domains, their ability to discriminate anatomically similar structures has not been systematically investigated. We address this gap by focusing on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features and pose a critical challenge for reliable biometric assessment. Methods: To ensure a fair and reproducible evaluation, all publicly available fetal ultrasound datasets were curated and aggregated into a unified multicenter benchmark, FetalUS-188K, comprising more than 188,000 annotated images from heterogeneous acquisition settings. DINOv3 was pretrained in a self-supervised manner to learn ultrasound-aware representations. The learned features were then evaluated through standardized adaptation protocols, including linear probing with frozen backbone and full fine-tuning, under two initialization schemes: (i) pretraining on FetalUS-188K and (ii) initialization from natural-image DINOv3 weights. Results: Models pretrained on fetal ultrasound data consistently outperformed those initialized on natural images, with weighted F1-score improvements of up to 20 percent. Domain-adaptive pretraining enabled the network to preserve subtle echogenic and structural cues crucial for distinguishing intermediate planes such as TV. Conclusion: Results demonstrate that generic foundation models fail to generalize under low inter-class variability, whereas domain-specific pretraining is essential to achieve robust and clinically reliable representations in fetal brain ultrasound imaging.

Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound

TL;DR

Abstract

Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)