Table of Contents
Fetching ...

Prompt-driven Universal Model for View-Agnostic Echocardiography Analysis

Sekeun Kim, Hui Ren, Peng Guo, Abder-Rahman Ali, Patrick Zhang, Kyungsang Kim, Xiang Li, Quanzheng Li

TL;DR

This work tackles the challenge of echocardiography segmentation across multiple standard views by proposing a prompt-driven universal framework that eliminates view-specific models and view identification. It fuses a learnable prompt pool with pixel-text dense alignment guided by a pre-trained medical language model (ClinicalBERT) to generate view-adaptive segmentation heads, while handling partially labeled data via video masked back-propagation. The method demonstrates state-of-the-art performance among universal multi-view approaches across three datasets and multiple views, with ablations confirming the critical roles of pixel-text alignment and medical-text encoders. By unifying segmentation across views, the approach simplifies clinical workflows and offers scalable extension to additional views without retraining separate models.

Abstract

Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation as the number of required models increases with the number of standard views. To address this, in this paper, we present a prompt-driven universal method for view-agnostic echocardiography analysis. Considering the domain shift between standard views, we first introduce a method called prompt matching, aimed at learning prompts specific to different views by matching prompts and querying input embeddings using a pre-trained vision model. Then, we utilized a pre-trained medical language model to align textual information with pixel data for accurate segmentation. Extensive experiments on three standard views showed that our approach significantly outperforms the state-of-the-art universal methods and achieves comparable or even better performances over the segmentation model trained and tested on same views.

Prompt-driven Universal Model for View-Agnostic Echocardiography Analysis

TL;DR

This work tackles the challenge of echocardiography segmentation across multiple standard views by proposing a prompt-driven universal framework that eliminates view-specific models and view identification. It fuses a learnable prompt pool with pixel-text dense alignment guided by a pre-trained medical language model (ClinicalBERT) to generate view-adaptive segmentation heads, while handling partially labeled data via video masked back-propagation. The method demonstrates state-of-the-art performance among universal multi-view approaches across three datasets and multiple views, with ablations confirming the critical roles of pixel-text alignment and medical-text encoders. By unifying segmentation across views, the approach simplifies clinical workflows and offers scalable extension to additional views without retraining separate models.

Abstract

Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation as the number of required models increases with the number of standard views. To address this, in this paper, we present a prompt-driven universal method for view-agnostic echocardiography analysis. Considering the domain shift between standard views, we first introduce a method called prompt matching, aimed at learning prompts specific to different views by matching prompts and querying input embeddings using a pre-trained vision model. Then, we utilized a pre-trained medical language model to align textual information with pixel data for accurate segmentation. Extensive experiments on three standard views showed that our approach significantly outperforms the state-of-the-art universal methods and achieves comparable or even better performances over the segmentation model trained and tested on same views.
Paper Structure (10 sections, 3 equations, 3 figures, 4 tables)

This paper contains 10 sections, 3 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Over all framework of our proposed universal model for view-agnostic segmentation. The pre-trained model and query model remain frozen, while the other models are trainable.
  • Figure 2: Qualitative visualization of segmentation results generated from our method and state-of-the-art universal methods on representative image. Red and blue represents LV$_{endo}$ and LV$_{epi}$, respectively.
  • Figure 3: Comparison of mean model performance on different encoder design.