Table of Contents
Fetching ...

Cardiac valve event timing in echocardiography using deep learning and triplane recordings

Benjamin Strandli Fermann, John Nyberg, Espen W. Remme, Jahn Frederik Grue, Helén Grue, Roger Håland, Lasse Lovstakken, Håvard Dalen, Bjørnar Grenne, Svein Arne Aase, Sten Roar Snar, Andreas Østvik

TL;DR

This work addresses the challenge of automated, precise valve event timing in echocardiography by leveraging apical triplane recordings for ground-truth labeling and multi-view timing. It introduces two deep-learning architectures: a 3D CNN + 2 LSTM classifier and a ResNet-50 + 2 LSTM regressor, trained to detect six valve-related events across 4CH, 2CH, and APLAX views, with ground truth obtained from synchronized triplane data. The classification network generally yields higher accuracy (lower average absolute frame difference) than the regression model, achieving as low as $0.6$ frames ($12$ ms) for mitral valve opening on internal data, and up to $1.8$ frames ($30$ ms) on external data, demonstrating robust cross-view performance. The study also reports low interobserver variability for ground-truth annotations and demonstrates the potential to improve clinical measurements by reducing dependence on external ECG or cross-modality timing. Overall, the method enables automatic detection of six cardiac events from standard apical views, supporting faster workflows and more comprehensive timing-related measurements in practice.

Abstract

Cardiac valve event timing plays a crucial role when conducting clinical measurements using echocardiography. However, established automated approaches are limited by the need of external electrocardiogram sensors, and manual measurements often rely on timing from different cardiac cycles. Recent methods have applied deep learning to cardiac timing, but they have mainly been restricted to only detecting two key time points, namely end-diastole (ED) and end-systole (ES). In this work, we propose a deep learning approach that leverages triplane recordings to enhance detection of valve events in echocardiography. Our method demonstrates improved performance detecting six different events, including valve events conventionally associated with ED and ES. Of all events, we achieve an average absolute frame difference (aFD) of maximum 1.4 frames (29 ms) for start of diastasis, down to 0.6 frames (12 ms) for mitral valve opening when performing a ten-fold cross-validation with test splits on triplane data from 240 patients. On an external independent test consisting of apical long-axis data from 180 other patients, the worst performing event detection had an aFD of 1.8 (30 ms). The proposed approach has the potential to significantly impact clinical practice by enabling more accurate, rapid and comprehensive event detection, leading to improved clinical measurements.

Cardiac valve event timing in echocardiography using deep learning and triplane recordings

TL;DR

This work addresses the challenge of automated, precise valve event timing in echocardiography by leveraging apical triplane recordings for ground-truth labeling and multi-view timing. It introduces two deep-learning architectures: a 3D CNN + 2 LSTM classifier and a ResNet-50 + 2 LSTM regressor, trained to detect six valve-related events across 4CH, 2CH, and APLAX views, with ground truth obtained from synchronized triplane data. The classification network generally yields higher accuracy (lower average absolute frame difference) than the regression model, achieving as low as frames ( ms) for mitral valve opening on internal data, and up to frames ( ms) on external data, demonstrating robust cross-view performance. The study also reports low interobserver variability for ground-truth annotations and demonstrates the potential to improve clinical measurements by reducing dependence on external ECG or cross-modality timing. Overall, the method enables automatic detection of six cardiac events from standard apical views, supporting faster workflows and more comprehensive timing-related measurements in practice.

Abstract

Cardiac valve event timing plays a crucial role when conducting clinical measurements using echocardiography. However, established automated approaches are limited by the need of external electrocardiogram sensors, and manual measurements often rely on timing from different cardiac cycles. Recent methods have applied deep learning to cardiac timing, but they have mainly been restricted to only detecting two key time points, namely end-diastole (ED) and end-systole (ES). In this work, we propose a deep learning approach that leverages triplane recordings to enhance detection of valve events in echocardiography. Our method demonstrates improved performance detecting six different events, including valve events conventionally associated with ED and ES. Of all events, we achieve an average absolute frame difference (aFD) of maximum 1.4 frames (29 ms) for start of diastasis, down to 0.6 frames (12 ms) for mitral valve opening when performing a ten-fold cross-validation with test splits on triplane data from 240 patients. On an external independent test consisting of apical long-axis data from 180 other patients, the worst performing event detection had an aFD of 1.8 (30 ms). The proposed approach has the potential to significantly impact clinical practice by enabling more accurate, rapid and comprehensive event detection, leading to improved clinical measurements.
Paper Structure (23 sections, 3 figures, 9 tables)

This paper contains 23 sections, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Annotations were performed on triplane data to provide the annotators with as much information as possible. The triplane views were then split and treated as three separate recordings with the same annotations before being used for training. The networks only train and evaluate on a single view at a time, so the resulting model is capable of predicting events from any regular single view 2D recording from any of the apical views.
  • Figure 2: (1) Expert reference and (2) predicted output for the deep learning architectures. (1a) For the classification network, each frame is labeled by its phase. Each event timing represents the first frame of the corresponding phase. (2a) The events are derived from the phase predictions by finding the transitions from one phase to the next. (1b) The regression network labels each frame by its event, or its proximity to a nearby event. Each label is independent and weights are reduced linearly with a maximum distance of five frames. (2b) The events are derived from the predictions by finding the peak of each event label.
  • Figure 3: Prediction error in frames for the \ref{['fig:hist:classification']} classification network and \ref{['fig:hist:regression']} regression network across all six different events. The histograms illustrate the proportional distributions of frame errors for each event.