Table of Contents
Fetching ...

EchoApex: A General-Purpose Vision Foundation Model for Echocardiography

Abdoul Aziz Amadou, Yue Zhang, Sebastien Piat, Paul Klein, Ingo Schmuecking, Tiziano Passerini, Puneet Sharma

TL;DR

EchoApex is introduced, the first general-purpose vision foundation model echocardiography, capable of addressing a diverse range of clinical applications with high efficiency and efficacy and illustrates the potential for developing a general-purpose vision foundation model tailored specifically for echocardiography.

Abstract

Quantitative evaluation of echocardiography is essential for precise assessment of cardiac condition, monitoring disease progression, and guiding treatment decisions. The diverse nature of echo images, including variations in probe types, manufacturers, and pathologies, poses challenges for developing artificial intelligent models that can generalize across different clinical practice. We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice. Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres. By incorporating task-specific decoders and adapter modules, we demonstrate the effectiveness of EchoApex on 4 different kind of clinical applications with 28 sub-tasks, including view classification, interactive structure segmentation, left ventricle hypertrophy detection and automated ejection fraction estimation from view sequences. Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture, demonstrating the benefits of model pretraining at scale with in-domain data. Furthermore, EchoApex illustrates the potential for developing a general-purpose vision foundation model tailored specifically for echocardiography, capable of addressing a diverse range of clinical applications with high efficiency and efficacy.

EchoApex: A General-Purpose Vision Foundation Model for Echocardiography

TL;DR

EchoApex is introduced, the first general-purpose vision foundation model echocardiography, capable of addressing a diverse range of clinical applications with high efficiency and efficacy and illustrates the potential for developing a general-purpose vision foundation model tailored specifically for echocardiography.

Abstract

Quantitative evaluation of echocardiography is essential for precise assessment of cardiac condition, monitoring disease progression, and guiding treatment decisions. The diverse nature of echo images, including variations in probe types, manufacturers, and pathologies, poses challenges for developing artificial intelligent models that can generalize across different clinical practice. We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice. Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres. By incorporating task-specific decoders and adapter modules, we demonstrate the effectiveness of EchoApex on 4 different kind of clinical applications with 28 sub-tasks, including view classification, interactive structure segmentation, left ventricle hypertrophy detection and automated ejection fraction estimation from view sequences. Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture, demonstrating the benefits of model pretraining at scale with in-domain data. Furthermore, EchoApex illustrates the potential for developing a general-purpose vision foundation model tailored specifically for echocardiography, capable of addressing a diverse range of clinical applications with high efficiency and efficacy.

Paper Structure

This paper contains 29 sections, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Overview of EchoApex. EchoApex is a general-purpose vision foundation model enpowering a diverse range of clinical tasks in echocardiography. (a) Data curation process. A total number of 450K videos are collected from 11 clinical centres, covering different image characteristics. (b) Evaluation of EchoApex on view classification, structure segmentation, ventricular measurement and automated EF prediction. EchoApex shows superior performance to the task specialist in all evaluated tasks. (c) EchoApex pretraining with the state-of-the-art self-supervised training algorithm DINOv2. (d) EchoApex application in downstream tasks with optional parameter efficient model fine-tuning.
  • Figure 2: Qualitative and quantitative evaluation of model pretraining. Left top: KNN accuracy at different training epochs during the pretraining of ViT-S model. Right bottom: KNN accuracy at different training epochs during pretraining of ViT-B model. For both plots, blue segment represents pretraining on Echo12M and orange segment represents continual training on echo20M. Right: Two-dimensional t-SNE visualization of the embeddings of 10K images. The encoder is ViT-B pretrained on Echo20M. A perplexity of 50 is used.
  • Figure 3: Study on sequence view classification.(a) Architecture for the classification task. (b) Confusion matrix showing EchoApex-B's performance over the 18 classes. (c) Performance comparison of EchoApex-S and the ResNet baseline. The table indicates the number of test sequences and whether or not EchoApex-S significantly outperforms the baseline. (d) Balanced accuracy (BACC) of the EchoApex models trained with different backbones and with(out) adapters.
  • Figure 4: Study on interactive segmentation.(a) EchoApex attaches a prompt encoder and a mask decoder for the interactive segmentation task, taking prompt in forms of points, boxes and texts. (b) Number of annotations in all evaluated dataset. (c) Performance comparison between EchoApex-SAM variations and sub-task specialist models, e.g. UNet individually trained on each dataset, and generalist model MedSAM trained on multi-modal medical images. Oracle box prompt is used in this experiment category. (d) Few-shot learning evaluation of EchoApex-S vs. its ImageNet pretrained counterpart. Text prompt is used in this experiment category. (e) Generalization capability test of EchoApex-S compared with specialist model DeepLabV3 on both in-domain (ENDym) and out-domain data.
  • Figure 5: Study on left ventricle hypertrophy detection.(a) Architecture for the landmark detection task. (b) Measurements distribution for the EchoNet-LVH (internal) and Unity (external) datasets. (c, d) Landmark error (mm) and (e, f) MAE (mm) distribution on the test datasets. $p<0.01$ is from a one-sided t-test showing significant improvement between EchoApex-S and DeepLabV3. (g) Self-attention maps from the last block of the EchoApex encoder showing regions of high similarity between image tokens, alongside the predicted (red) and ground truth (green) landmark positions.
  • ...and 4 more figures