Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification
Satchel French, Faith Zhu, Amish Jain, Naimul Khan
TL;DR
This work addresses neonatal echocardiographic viewpoint classification by recasting it as video sequence classification rather than single-image labeling. It introduces Temporal Feature Weaving (TFW), a CNN-GRU architecture that weaves per-frame CNN features across time to form a temporal-spatial signature, yielding state-of-the-art accuracy (≈93.8%) and F1 (≈93.7%) on the Neonatal Echocardiogram Dataset (NED) with 16 viewpoints. A key contribution is the professionally labeled, open-source NED dataset, alongside an architecture that maintains a modest model size (~30 million parameters) suitable for real-time smartphone deployment. The findings demonstrate that incorporating temporal dynamics significantly improves viewpoint discrimination in neonatal echocardiography and provides a practical, accessible resource to enhance screening and training in resource-limited settings.
Abstract
Automated viewpoint classification in echocardiograms can help under-resourced clinics and hospitals in providing faster diagnosis and screening when expert technicians may not be available. We propose a novel approach towards echocardiographic viewpoint classification. We show that treating viewpoint classification as video classification rather than image classification yields advantage. We propose a CNN-GRU architecture with a novel temporal feature weaving method, which leverages both spatial and temporal information to yield a 4.33\% increase in accuracy over baseline image classification while using only four consecutive frames. The proposed approach incurs minimal computational overhead. Additionally, we publish the Neonatal Echocardiogram Dataset (NED), a professionally-annotated dataset providing sixteen viewpoints and associated echocardipgraphy videos to encourage future work and development in this field. Code available at: https://github.com/satchelfrench/NED
