Neonatal Face and Facial Landmark Detection from Video Recordings

Ethan Grooby; Chiranjibi Sitaula; Soodeh Ahani; Liisa Holsti; Atul Malhotra; Guy A. Dumont; Faezeh Marzbanrad

Neonatal Face and Facial Landmark Detection from Video Recordings

Ethan Grooby, Chiranjibi Sitaula, Soodeh Ahani, Liisa Holsti, Atul Malhotra, Guy A. Dumont, Faezeh Marzbanrad

TL;DR

This work targets automated neonatal face and facial landmark detection from video, a critical first step for non-contact neonatal health assessments. It combines transfer learning on two YOLO-based detectors (YOLOv5 and YOLOv7Face), data augmentation, and image re-orientation, evaluated on three public neonatal datasets with 455 annotated images across 324 neonates. The authors report that retrained YOLOv7Face achieves strong face-detection performance (AP_{50} = 100%, mAP = 84.7%) and superior landmark accuracy (all landmarks $MNE \,=\, 0.072$), outperforming existing neonatal methods; re-orientation improves several existing detectors, while fusion provides minor gains. These results advance automated neonatal health assessment pipelines, though runtime remains a consideration for real-time deployment, and future work should address neonatal face segmentation and mobile-optimized architectures.

Abstract

This paper explores automated face and facial landmark detection of neonates, which is an important first step in many video-based neonatal health applications, such as vital sign estimation, pain assessment, sleep-wake classification, and jaundice detection. Utilising three publicly available datasets of neonates in the clinical environment, 366 images (258 subjects) and 89 (66 subjects) were annotated for training and testing, respectively. Transfer learning was applied to two YOLO-based models, with input training images augmented with random horizontal flipping, photo-metric colour distortion, translation and scaling during each training epoch. Additionally, the re-orientation of input images and fusion of trained deep learning models was explored. Our proposed model based on YOLOv7Face outperformed existing methods with a mean average precision of 84.8% for face detection, and a normalised mean error of 0.072 for facial landmark detection. Overall, this will assist in the development of fully automated neonatal health assessment algorithms.

Neonatal Face and Facial Landmark Detection from Video Recordings

TL;DR

Abstract

Neonatal Face and Facial Landmark Detection from Video Recordings

Authors

TL;DR

Abstract

Table of Contents