Prompt-driven Universal Model for View-Agnostic Echocardiography Analysis
Sekeun Kim, Hui Ren, Peng Guo, Abder-Rahman Ali, Patrick Zhang, Kyungsang Kim, Xiang Li, Quanzheng Li
TL;DR
This work tackles the challenge of echocardiography segmentation across multiple standard views by proposing a prompt-driven universal framework that eliminates view-specific models and view identification. It fuses a learnable prompt pool with pixel-text dense alignment guided by a pre-trained medical language model (ClinicalBERT) to generate view-adaptive segmentation heads, while handling partially labeled data via video masked back-propagation. The method demonstrates state-of-the-art performance among universal multi-view approaches across three datasets and multiple views, with ablations confirming the critical roles of pixel-text alignment and medical-text encoders. By unifying segmentation across views, the approach simplifies clinical workflows and offers scalable extension to additional views without retraining separate models.
Abstract
Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation as the number of required models increases with the number of standard views. To address this, in this paper, we present a prompt-driven universal method for view-agnostic echocardiography analysis. Considering the domain shift between standard views, we first introduce a method called prompt matching, aimed at learning prompts specific to different views by matching prompts and querying input embeddings using a pre-trained vision model. Then, we utilized a pre-trained medical language model to align textual information with pixel data for accurate segmentation. Extensive experiments on three standard views showed that our approach significantly outperforms the state-of-the-art universal methods and achieves comparable or even better performances over the segmentation model trained and tested on same views.
