Table of Contents
Fetching ...

MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqub

TL;DR

This work introduces Selective Repulsive Knowledge Distillation, which decomposes contrastive KD into diagonal and off-diagonal components: matched pair alignment is preserved while the off-diagonal weight decays into negative values, repelling the student from the teacher's inter-class confusions and forcing discovery of architecturally native features.

Abstract

Fetal ultrasound AI could transform prenatal care in low-resource settings, yet current foundation models exceed 300M visual parameters, precluding deployment on point-of-care devices. Standard knowledge distillation fails under such extreme capacity gaps (~26x), as compact students waste capacity mimicking architectural artifacts of oversized teachers. We introduce Selective Repulsive Knowledge Distillation, which decomposes contrastive KD into diagonal and off-diagonal components: matched pair alignment is preserved while the off-diagonal weight decays into negative values, repelling the student from the teacher's inter-class confusions and forcing discovery of architecturally native features. Our 11.4M parameter student surpasses the 304M-parameter FetalCLIP teacher on zero-shot HC18 biometry validity (88.6% vs. 83.5%) and brain sub-plane F1 (0.784 vs. 0.702), while running at 1.6 ms on iPhone 16 Pro, enabling real-time assistive AI on handheld ultrasound devices. Our code, models, and app are publicly available at https://github.com/numanai/MobileFetalCLIP.

MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

TL;DR

This work introduces Selective Repulsive Knowledge Distillation, which decomposes contrastive KD into diagonal and off-diagonal components: matched pair alignment is preserved while the off-diagonal weight decays into negative values, repelling the student from the teacher's inter-class confusions and forcing discovery of architecturally native features.

Abstract

Fetal ultrasound AI could transform prenatal care in low-resource settings, yet current foundation models exceed 300M visual parameters, precluding deployment on point-of-care devices. Standard knowledge distillation fails under such extreme capacity gaps (~26x), as compact students waste capacity mimicking architectural artifacts of oversized teachers. We introduce Selective Repulsive Knowledge Distillation, which decomposes contrastive KD into diagonal and off-diagonal components: matched pair alignment is preserved while the off-diagonal weight decays into negative values, repelling the student from the teacher's inter-class confusions and forcing discovery of architecturally native features. Our 11.4M parameter student surpasses the 304M-parameter FetalCLIP teacher on zero-shot HC18 biometry validity (88.6% vs. 83.5%) and brain sub-plane F1 (0.784 vs. 0.702), while running at 1.6 ms on iPhone 16 Pro, enabling real-time assistive AI on handheld ultrasound devices. Our code, models, and app are publicly available at https://github.com/numanai/MobileFetalCLIP.
Paper Structure (32 sections, 7 equations, 3 figures, 5 tables)

This paper contains 32 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the MobileFetalCLIP framework. (A) Distillation setup: a frozen FetalCLIP teacher (ViT-L/14, 304M visual params) produces an $N{\times}N$ similarity matrix; a lightweight FastViT student (11.4M visual params) is trained via $\mathcal{L}_\mathrm{CLIP}$ and $\mathcal{L}_\mathrm{KD}$. (B) Attraction-to-repulsion dynamics: the off-diagonal weight $\beta(t)$ decays from $\beta_0$ into negative values; the diagonal weight $\mathcal{L}_\mathrm{diag}$ remains fixed, preserving matched-pair alignment throughout training. (C) Outcome: Selective Repulsive KD produces structured decorrelation, resulting in better cluster separation and a higher HC18 validity rate and brain sub-plane F1 with 26$\times$ fewer visual parameters.
  • Figure 2: Training dynamics for representative KD configurations. (a) KD weight schedule over epochs: for coupled runs the weight is $\lambda_{\mathrm{KL}}(t)$; for selective mode it is $\beta(t)$. Positive decay stays above zero; repulsive variants cross into the repulsive zone (weight${<}0$). (b) Zero-shot Avg.$^\ddag$ per epoch: repulsive runs exhibit a characteristic late surge once entering the repulsive zone; Selective Repulsive KD ($\beta_0{=}2$, $r{=}{-}0.8$) achieves the highest final score ($\star$), exceeding the FetalCLIP teacher.
  • Figure 3: t-SNE projections of brain sub-plane embeddings (transthalamic, transcerebellum, transventricular). (a) No KD: overlapping clusters. (b) Static KD: marginal improvement. (c) Selective Repulsive KD: well-separated, compact clusters.