Table of Contents
Fetching ...

VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

Teng Wang, Haojun Jiang, Yuxuan Wang, Zhenguo Sun, Shiji Song, Gao Huang

TL;DR

Cardiac ultrasound quality depends on operator skill, creating a need for real-time guidance. The authors propose VA-Adapter, a parameter-efficient module that attaches to ultrasound foundation models and learns vision-action sequences for probe guidance while keeping the backbone frozen. They collect a large-scale dataset of 1.31 million image–action pairs from 178 subjects and train ten action-prediction heads to reach ten standard planes, using a GRU sequence encoder and Smooth L1 loss. Results show the VA-Adapter achieves superior guidance accuracy with substantial parameter savings and real-time inference, enabling practical deployment of foundation-model–driven probe guidance.

Abstract

Echocardiography is a critical tool for detecting heart diseases. Recently, ultrasound foundation models have demonstrated remarkable capabilities in cardiac ultrasound image analysis. However, obtaining high-quality ultrasound images is a prerequisite for accurate diagnosis. Due to the exceptionally high operational difficulty of cardiac ultrasound, there is a shortage of highly skilled personnel, which hinders patients from receiving timely examination services. In this paper, we aim to adapt the medical knowledge learned by foundation models from vast datasets to the probe guidance task, which is designed to provide real-time operational recommendations for junior sonographers to acquire high-quality ultrasound images. Moreover, inspired by the practice where experts optimize action decisions based on past explorations, we meticulously design a parameter-efficient Vision-Action Adapter (VA-Adapter) to enable foundation model's image encoder to encode vision-action sequences, thereby enhancing guidance performance. With built-in sequential reasoning capabilities in a compact design, the VA-Adapter enables a pre-trained ultrasound foundation model to learn precise probe adjustment strategies by fine-tuning only a small subset of parameters. Extensive experiments demonstrate that the VA-Adapter can surpass strong probe guidance models. Our code will be released after acceptance.

VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

TL;DR

Cardiac ultrasound quality depends on operator skill, creating a need for real-time guidance. The authors propose VA-Adapter, a parameter-efficient module that attaches to ultrasound foundation models and learns vision-action sequences for probe guidance while keeping the backbone frozen. They collect a large-scale dataset of 1.31 million image–action pairs from 178 subjects and train ten action-prediction heads to reach ten standard planes, using a GRU sequence encoder and Smooth L1 loss. Results show the VA-Adapter achieves superior guidance accuracy with substantial parameter savings and real-time inference, enabling practical deployment of foundation-model–driven probe guidance.

Abstract

Echocardiography is a critical tool for detecting heart diseases. Recently, ultrasound foundation models have demonstrated remarkable capabilities in cardiac ultrasound image analysis. However, obtaining high-quality ultrasound images is a prerequisite for accurate diagnosis. Due to the exceptionally high operational difficulty of cardiac ultrasound, there is a shortage of highly skilled personnel, which hinders patients from receiving timely examination services. In this paper, we aim to adapt the medical knowledge learned by foundation models from vast datasets to the probe guidance task, which is designed to provide real-time operational recommendations for junior sonographers to acquire high-quality ultrasound images. Moreover, inspired by the practice where experts optimize action decisions based on past explorations, we meticulously design a parameter-efficient Vision-Action Adapter (VA-Adapter) to enable foundation model's image encoder to encode vision-action sequences, thereby enhancing guidance performance. With built-in sequential reasoning capabilities in a compact design, the VA-Adapter enables a pre-trained ultrasound foundation model to learn precise probe adjustment strategies by fine-tuning only a small subset of parameters. Extensive experiments demonstrate that the VA-Adapter can surpass strong probe guidance models. Our code will be released after acceptance.

Paper Structure

This paper contains 22 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the dataset. (a) Large-scale diagnostic foundation model dataset. (b) Our dataset statistic. (c) Standard planes. The view images are sourced from mitchell2019guidelines.
  • Figure 2: Illustration of the architecture of the VA-Adapter. The left side shows that we insert VA-Adapter into the deep layers of foundation models, and the right side shows the internal structure of VA-Adapter.
  • Figure 3: Performance comparison of different PEFT methods on USFM and BiomedCLIP.
  • Figure 4: Ablation study on vision-action interaction module of the EchoCLIP model.
  • Figure 5: Ablation study on adapter dimension of the EchoCLIP model.
  • ...and 1 more figures