Table of Contents
Fetching ...

Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos

Pramit Saha, Divyanshu Mishra, Netzahualcoyotl Hernandez-Cruz, Olga Patey, Aris Papageorghiou, Yuki M. Asano, J. Alison Noble

TL;DR

This work tackles CHD detection in fetal ultrasound under data scarcity and privacy constraints by reframing detection as normality modeling. It introduces Sparse Tube Ultrasound Distillation (STUD), a privacy-preserving, self-supervised video anomaly approach that learns healthy fetal heart representations using sparse spatio-temporal tubes and a DINO-like teacher-student framework, enabling zero-shot CHD detection with a simple KNN classifier. To overcome cross-site data sharing limitations, it proposes Divergence Vector-guided Model Merging (DiVMerge), a two-step process using the geometric median to denoise site models and divergence vectors to dynamically weight and selectively retain parameters, yielding a robust merged model without data exchange. On five real-world hospital datasets, DiVMerge outperforms site-specific and centralized baselines, exhibits strong zero-shot generalization across domain shifts, and demonstrates substantial improvements in accuracy and F1-score on unseen CHD cases, illustrating the practical value of privacy-preserving multi-site collaboration in fetal US CHD screening. The approach blends sparse, efficient video representations with principled, drift-resilient model aggregation to push toward clinically viable, privacy-compliant CHD detection. Key equations include the geometric median objective $\theta^* = \arg\min_{\theta} \sum_i w_i \|\theta_i - \theta\|_2$ and the divergence-guided weighting $\alpha_i = \exp(-\lambda \|\Delta_i\|_2)$ with $\Delta_i = \theta_i - \theta^*$, which underpin the DiVMerge mechanism and its selective parameter retention rule $\theta_f^p = \theta_i^p$ if $|\theta_i^p| \geq \gamma |\Delta_i^p|\theta^{*p}$, otherwise replaced by $\theta^{*p}$. Significance: this framework enables privacy-preserving, cross-site CHD detection in settings where data sharing is restricted, while maintaining or improving detection performance on unseen, domain-shifted data.

Abstract

Congenital Heart Disease (CHD) is one of the leading causes of fetal mortality, yet the scarcity of labeled CHD data and strict privacy regulations surrounding fetal ultrasound (US) imaging present significant challenges for the development of deep learning-based models for CHD detection. Centralised collection of large real-world datasets for rare conditions, such as CHD, from large populations requires significant co-ordination and resource. In addition, data governance rules increasingly prevent data sharing between sites. To address these challenges, we introduce, for the first time, a novel privacy-preserving, zero-shot CHD detection framework that formulates CHD detection as a normality modeling problem integrated with model merging. In our framework dubbed Sparse Tube Ultrasound Distillation (STUD), each hospital site first trains a sparse video tube-based self-supervised video anomaly detection (VAD) model on normal fetal heart US clips with self-distillation loss. This enables site-specific models to independently learn the distribution of healthy cases. To aggregate knowledge across the decentralized models while maintaining privacy, we propose a Divergence Vector-Guided Model Merging approach, DivMerge, that combines site-specific models into a single VAD model without data exchange. Our approach preserves domain-agnostic rich spatio-temporal representations, ensuring generalization to unseen CHD cases. We evaluated our approach on real-world fetal US data collected from 5 hospital sites. Our merged model outperformed site-specific models by 23.77% and 30.13% in accuracy and F1-score respectively on external test sets.

Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos

TL;DR

This work tackles CHD detection in fetal ultrasound under data scarcity and privacy constraints by reframing detection as normality modeling. It introduces Sparse Tube Ultrasound Distillation (STUD), a privacy-preserving, self-supervised video anomaly approach that learns healthy fetal heart representations using sparse spatio-temporal tubes and a DINO-like teacher-student framework, enabling zero-shot CHD detection with a simple KNN classifier. To overcome cross-site data sharing limitations, it proposes Divergence Vector-guided Model Merging (DiVMerge), a two-step process using the geometric median to denoise site models and divergence vectors to dynamically weight and selectively retain parameters, yielding a robust merged model without data exchange. On five real-world hospital datasets, DiVMerge outperforms site-specific and centralized baselines, exhibits strong zero-shot generalization across domain shifts, and demonstrates substantial improvements in accuracy and F1-score on unseen CHD cases, illustrating the practical value of privacy-preserving multi-site collaboration in fetal US CHD screening. The approach blends sparse, efficient video representations with principled, drift-resilient model aggregation to push toward clinically viable, privacy-compliant CHD detection. Key equations include the geometric median objective and the divergence-guided weighting with , which underpin the DiVMerge mechanism and its selective parameter retention rule if , otherwise replaced by . Significance: this framework enables privacy-preserving, cross-site CHD detection in settings where data sharing is restricted, while maintaining or improving detection performance on unseen, domain-shifted data.

Abstract

Congenital Heart Disease (CHD) is one of the leading causes of fetal mortality, yet the scarcity of labeled CHD data and strict privacy regulations surrounding fetal ultrasound (US) imaging present significant challenges for the development of deep learning-based models for CHD detection. Centralised collection of large real-world datasets for rare conditions, such as CHD, from large populations requires significant co-ordination and resource. In addition, data governance rules increasingly prevent data sharing between sites. To address these challenges, we introduce, for the first time, a novel privacy-preserving, zero-shot CHD detection framework that formulates CHD detection as a normality modeling problem integrated with model merging. In our framework dubbed Sparse Tube Ultrasound Distillation (STUD), each hospital site first trains a sparse video tube-based self-supervised video anomaly detection (VAD) model on normal fetal heart US clips with self-distillation loss. This enables site-specific models to independently learn the distribution of healthy cases. To aggregate knowledge across the decentralized models while maintaining privacy, we propose a Divergence Vector-Guided Model Merging approach, DivMerge, that combines site-specific models into a single VAD model without data exchange. Our approach preserves domain-agnostic rich spatio-temporal representations, ensuring generalization to unseen CHD cases. We evaluated our approach on real-world fetal US data collected from 5 hospital sites. Our merged model outperformed site-specific models by 23.77% and 30.13% in accuracy and F1-score respectively on external test sets.

Paper Structure

This paper contains 14 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (a) t-SNE visualization (left) shows that our proposed method (after merging models trained on 3 sites) achieves nearly distinct clustering, suggesting well-separated feature representations. On the other hand, Model 3 (trained on Site 3) is observed to achieve low separability (right). (b) The quantitative comparison of both models evaluated on Site 2 further illustrates the benefit of our proposed model merging technique.
  • Figure 2: Overview of the proposed technique. Left figure shows self-supervised video anomaly network training at each site leveraging sparse-tube tokenizer and teacher-student model via self-distillation loss. This leads to the development of models $M_1$, $M_2$, and $M_3$ at three sites. The right figure shows the geometric median computation followed by estimation of divergence vectors for each site. The divergence vectors are then employed for selective parameter retention to reduce model drift and for adaptively weighting different models for final model merging.
  • Figure 3: Feature map visualization overlaid on sequential US frames, highlighting the model's capability to focus on key anatomical fetal heart structures for CHD
  • Figure 4: Confusion matrices illustrating the performance of various models on external sites 4 and 5. The results indicate that the Centralized Model and Model 1 struggle to detect most abnormal cases, while Model 3 frequently misclassifies normal cases as abnormal due to domain shift. In contrast, our model achieves the best performance, accurately distinguishing most normal and abnormal cases even with domain gap.