Two-Stream Thermal Imaging Fusion for Enhanced Time of Birth Detection in Neonatal Care
Jorge García-Torres, Øyvind Meinich-Bache, Sara Brunner, Siren Rettedal, Vilde Kolstad, Kjersti Engan
TL;DR
The paper addresses inaccuracies in manual Time of Birth (ToB) documentation in neonatal care by proposing a two-stream fusion framework that processes thermal imaging through a static image stream ($EfficientNet$) and a dynamic video stream ($MoViNet$) to produce a fusion score $p_{fusion}$. A score-aggregation module with a lightweight LSTM combines $p_{fusion}$ and an image-based score $p_{vnb}$ to estimate ToB as $\\\hat{T}_{birth}=\\arg\\max_t \\hat{y}_{joint}(t)$ with a threshold $\\gamma=0.95$, improving robustness to missed detections. The authors validate on a dataset of 611 thermal birth videos (and 258 image-based samples) from Stavanger University Hospital, using manual second-precision ToB annotations and VNB labels; evaluation on 35 test videos shows 100% ToB detection with a median ToB error of 2 seconds and mean error of 4.5 seconds, along with strong precision/recall and efficient runtime suitable for real-time NRAA timelines. This approach advances privacy-preserving birth identification via thermal imaging and provides a practical, scalable pathway toward automated, detailed resuscitation timelines (NRAA) in neonatal care.
Abstract
Around 10% of newborns require some help to initiate breathing, and 5\% need ventilation assistance. Accurate Time of Birth (ToB) documentation is essential for optimizing neonatal care, as timely interventions are vital for proper resuscitation. However, current clinical methods for recording ToB often rely on manual processes, which can be prone to inaccuracies. In this study, we present a novel two-stream fusion system that combines the power of image and video analysis to accurately detect the ToB from thermal recordings in the delivery room and operating theater. By integrating static and dynamic streams, our approach captures richer birth-related spatiotemporal features, leading to more robust and precise ToB estimation. We demonstrate that this synergy between data modalities enhances performance over single-stream approaches. Our system achieves 95.7% precision and 84.8% recall in detecting birth within short video clips. Additionally, with the help of a score aggregation module, it successfully identifies ToB in 100% of test cases, with a median absolute error of 2 seconds and an absolute mean deviation of 4.5 seconds compared to manual annotations.
