Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos
Pramit Saha, Divyanshu Mishra, Netzahualcoyotl Hernandez-Cruz, Olga Patey, Aris Papageorghiou, Yuki M. Asano, J. Alison Noble
TL;DR
This work tackles CHD detection in fetal ultrasound under data scarcity and privacy constraints by reframing detection as normality modeling. It introduces Sparse Tube Ultrasound Distillation (STUD), a privacy-preserving, self-supervised video anomaly approach that learns healthy fetal heart representations using sparse spatio-temporal tubes and a DINO-like teacher-student framework, enabling zero-shot CHD detection with a simple KNN classifier. To overcome cross-site data sharing limitations, it proposes Divergence Vector-guided Model Merging (DiVMerge), a two-step process using the geometric median to denoise site models and divergence vectors to dynamically weight and selectively retain parameters, yielding a robust merged model without data exchange. On five real-world hospital datasets, DiVMerge outperforms site-specific and centralized baselines, exhibits strong zero-shot generalization across domain shifts, and demonstrates substantial improvements in accuracy and F1-score on unseen CHD cases, illustrating the practical value of privacy-preserving multi-site collaboration in fetal US CHD screening. The approach blends sparse, efficient video representations with principled, drift-resilient model aggregation to push toward clinically viable, privacy-compliant CHD detection. Key equations include the geometric median objective $\theta^* = \arg\min_{\theta} \sum_i w_i \|\theta_i - \theta\|_2$ and the divergence-guided weighting $\alpha_i = \exp(-\lambda \|\Delta_i\|_2)$ with $\Delta_i = \theta_i - \theta^*$, which underpin the DiVMerge mechanism and its selective parameter retention rule $\theta_f^p = \theta_i^p$ if $|\theta_i^p| \geq \gamma |\Delta_i^p|\theta^{*p}$, otherwise replaced by $\theta^{*p}$. Significance: this framework enables privacy-preserving, cross-site CHD detection in settings where data sharing is restricted, while maintaining or improving detection performance on unseen, domain-shifted data.
Abstract
Congenital Heart Disease (CHD) is one of the leading causes of fetal mortality, yet the scarcity of labeled CHD data and strict privacy regulations surrounding fetal ultrasound (US) imaging present significant challenges for the development of deep learning-based models for CHD detection. Centralised collection of large real-world datasets for rare conditions, such as CHD, from large populations requires significant co-ordination and resource. In addition, data governance rules increasingly prevent data sharing between sites. To address these challenges, we introduce, for the first time, a novel privacy-preserving, zero-shot CHD detection framework that formulates CHD detection as a normality modeling problem integrated with model merging. In our framework dubbed Sparse Tube Ultrasound Distillation (STUD), each hospital site first trains a sparse video tube-based self-supervised video anomaly detection (VAD) model on normal fetal heart US clips with self-distillation loss. This enables site-specific models to independently learn the distribution of healthy cases. To aggregate knowledge across the decentralized models while maintaining privacy, we propose a Divergence Vector-Guided Model Merging approach, DivMerge, that combines site-specific models into a single VAD model without data exchange. Our approach preserves domain-agnostic rich spatio-temporal representations, ensuring generalization to unseen CHD cases. We evaluated our approach on real-world fetal US data collected from 5 hospital sites. Our merged model outperformed site-specific models by 23.77% and 30.13% in accuracy and F1-score respectively on external test sets.
