Explainable fetal ultrasound quality assessment with progressive concept bottleneck models
Manxi Lin, Aasa Feragen, Kamil Mikolaj, Zahra Bashir, Martin Grønnebæk Tolsgaard, Anders Nymark Christensen
TL;DR
The paper tackles the challenge of explainable fetal ultrasound quality assessment, where accurate identification of standard planes is essential yet difficult due to artifacts. It introduces Progressive Concept Bottleneck Models (P-CBMs) that marshal a three-stage pipeline—observer predicting segmentation concepts ($x \xrightarrow{g} s$), perceiver deriving property concepts ($s \xrightarrow{l} c$), and predictor concluding the final label ($c \xrightarrow{f} y$)—to enforce predictions that hinge on human-interpretable ISUOG criteria. By grounding concepts in segmentation and ISUOG properties, the approach mitigates information leakage, provides faithful explanations, and demonstrates strong generalization to external datasets without fine-tuning, outperforming concept-free baselines. The work evidences both improved accuracy and robust, actionable explanations, offering clinicians real-time guidance for optimizing image acquisition and downstream biometric assessments. Overall, P-CBM advances explainable, clinically aligned AI for fetal ultrasound with potential for deployment across diverse centers and setups.
Abstract
The quality of fetal ultrasound screening scans directly influences the precision of biometric measurements. However, acquiring high-quality scans is labor-intensive and highly relies on the operator's skills. Considering the low contrastiveness and imaging artifacts that widely exist in ultrasound, even a dedicated deep-learning model can be vulnerable to learning from confounding information in the image. In this paper, we propose a holistic and explainable method for fetal ultrasound quality assessment, where we design a hierarchical concept bottleneck model by introducing human-readable ``concepts" into the task and imitating the sequential expert decision-making process. This hierarchical information flow forces the model to learn concepts from semantically meaningful areas: The model first passes through a layer of visual, segmentation-based concepts, and next a second layer of property concepts directly associated with the decision-making task. We consider the quality assessment to be in a more challenging but more realistic setting, with fine-grained image recognition. Experiments show that our model outperforms equivalent concept-free models on an in-house dataset, and shows better generalizability on two public benchmarks, one from Spain and one from Africa, without any fine-tuning.
