Table of Contents
Fetching ...

Automated Interpretable 2D Video Extraction from 3D Echocardiography

Milos Vukadinovic, Hirotaka Ieki, Yuki Sahashi, David Ouyang, Bryan He

TL;DR

This work addresses the bottleneck of converting 3D echocardiography into standard 2D views by presenting an end-to-end pipeline that decodes 3D volumes, samples planes with a plane-based representation, and automatically selects eight standard views using landmark localization and a view classifier. The method achieves high-quality, diagnostically useful 2D videos from 3D data, validated by blinded cardiologist assessment (~96% accuracy) and strong AI-task performance on disease detection and structural measurement tracing. Key contributions include a publicly released decoding workflow, a robust plane-sampling framework, and evidence that 2D views extracted from 3D volumes can match the diagnostic utility of conventionally acquired 2D views, potentially accelerating adoption of 3D echocardiography in clinical practice.

Abstract

Although the heart has complex three-dimensional (3D) anatomy, conventional medical imaging with cardiac ultrasound relies on a series of 2D videos showing individual cardiac structures. 3D echocardiography is a developing modality that now offers adequate image quality for clinical use, with potential to streamline acquisition and improve assessment of off-axis features. We propose an automated method to select standard 2D views from 3D cardiac ultrasound volumes, allowing physicians to interpret the data in their usual format while benefiting from the speed and usability of 3D scanning. Applying a deep learning view classifier and downstream heuristics based on anatomical landmarks together with heuristics provided by cardiologists, we reconstruct standard echocardiography views. This approach was validated by three cardiologists in blinded evaluation (96\% accuracy in 1,600 videos from 2 hospitals). The downstream 2D videos were also validated in their ability to detect cardiac abnormalities using AI echocardiography models (EchoPrime and PanEcho) as well as ability to generate clinical-grade measurements of cardiac anatomy (EchoNet-Measurement). We demonstrated that the extracted 2D videos preserve spatial calibration and diagnostic features, allowing clinicians to obtain accurate real-world interpretations from 3D volumes. We release the code and a dataset of 29 3D echocardiography videos https://github.com/echonet/3d-echo .

Automated Interpretable 2D Video Extraction from 3D Echocardiography

TL;DR

This work addresses the bottleneck of converting 3D echocardiography into standard 2D views by presenting an end-to-end pipeline that decodes 3D volumes, samples planes with a plane-based representation, and automatically selects eight standard views using landmark localization and a view classifier. The method achieves high-quality, diagnostically useful 2D videos from 3D data, validated by blinded cardiologist assessment (~96% accuracy) and strong AI-task performance on disease detection and structural measurement tracing. Key contributions include a publicly released decoding workflow, a robust plane-sampling framework, and evidence that 2D views extracted from 3D volumes can match the diagnostic utility of conventionally acquired 2D views, potentially accelerating adoption of 3D echocardiography in clinical practice.

Abstract

Although the heart has complex three-dimensional (3D) anatomy, conventional medical imaging with cardiac ultrasound relies on a series of 2D videos showing individual cardiac structures. 3D echocardiography is a developing modality that now offers adequate image quality for clinical use, with potential to streamline acquisition and improve assessment of off-axis features. We propose an automated method to select standard 2D views from 3D cardiac ultrasound volumes, allowing physicians to interpret the data in their usual format while benefiting from the speed and usability of 3D scanning. Applying a deep learning view classifier and downstream heuristics based on anatomical landmarks together with heuristics provided by cardiologists, we reconstruct standard echocardiography views. This approach was validated by three cardiologists in blinded evaluation (96\% accuracy in 1,600 videos from 2 hospitals). The downstream 2D videos were also validated in their ability to detect cardiac abnormalities using AI echocardiography models (EchoPrime and PanEcho) as well as ability to generate clinical-grade measurements of cardiac anatomy (EchoNet-Measurement). We demonstrated that the extracted 2D videos preserve spatial calibration and diagnostic features, allowing clinicians to obtain accurate real-world interpretations from 3D volumes. We release the code and a dataset of 29 3D echocardiography videos https://github.com/echonet/3d-echo .

Paper Structure

This paper contains 29 sections, 18 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Task Overview: Decomposing 3D cardiac ultrasound volumes into standard 2D images. Left: 3D scan is acquired as a spherical pyramid. Right: Eight standard views can be extracted from the 3D scan.
  • Figure 2: Overview of the proposed view extraction method: 1) A segmentation model localizes key landmarks (A4C, LA, SAX, and LV length). 2) A plane search range is defined using cardiologist-provided heuristics and the detected landmarks. 3) A view classifier performs the search and automatically selects the best views.
  • Figure 3: Results from Cardiologists Assessment. Cardiologists were asked to assess quality and view-correctness of $1600$ extracted videos (8 views × 100 videos per view × 2 institutions).
  • Figure 4: Measurement Tracing on Extracted Views. Left: Visualization of the traced measurements. Right: Scatterplots with correlation coefficients against ground truth across two datasets.
  • Figure 5: Decoding Bytes to Voxel Intensities. The first 4 bytes represent an integer indicating the total data size in bytes, followed by 4 bytes specifying the total number of frames. Next, a series of 4-byte integers (one for each frame) provide the starting byte offset of each frame within the stream. For each frame, the first 32 bytes are a CRC checksum, and the remaining bytes are a zlib-compressed 3D volume voxel intensities.