NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

Yash Khandelwal; Mayur Arvind; Sriram Kumar; Ashish Gupta; Sachin Kumar Danisetty; Piyush Bagad; Anish Madan; Mayank Lunayach; Aditya Annavajjala; Abhishek Maiti; Sansiddh Jain; Aman Dalmia; Namrata Deka; Jerome White; Jigar Doshi; Angjoo Kanazawa; Rahul Panicker; Alpan Raval; Srinivas Rana; Makarand Tapaswi

NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

Yash Khandelwal, Mayur Arvind, Sriram Kumar, Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, Makarand Tapaswi

TL;DR

The paper tackles the public health challenge of malnutrition screening in newborns by enabling contactless anthropometry in rural LMIC settings. It introduces NurtureNet, a video-based, multi-task regression framework that ingests RGB video from a low-cost smartphone and augments visual features with birth weight and age to predict weight, length, head circumference, and chest circumference, formalized as w = MLP_w([z, w^0, a]). It leverages proxy vision tasks—segmentation and keypoints—via pseudo-labels to improve representation, achieving a weight MAE of $114.3$ g and a relative error of $3.9\%$, while remaining deployable offline on devices around $15$ MB. Extensive rural-field experiments (12,901 videos) show robustness to noisy tabular inputs and substantial improvements over conventional practices (e.g., MAE of $183$ g for spring-balance readings). The approach offers a scalable, geo-tagged, contactless solution to monitor newborn growth and inform timely interventions, with potential for large-scale impact in public health programs.

Abstract

Malnutrition among newborns is a top public health concern in developing countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for contactless newborn anthropometry in the community. We propose NurtureNet, a multi-task model that fuses visual information (a video taken with a low-cost smartphone) with tabular inputs to regress multiple anthropometry estimates including weight, length, head circumference, and chest circumference. We show that visual proxy tasks of segmentation and keypoint prediction further improve performance. We establish the efficacy of the model through several experiments and achieve a relative error of 3.9% and mean absolute error of 114.3 g for weight estimation. Model compression to 15 MB also allows offline deployment to low-cost smartphones.

NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

TL;DR

g and a relative error of

, while remaining deployable offline on devices around

MB. Extensive rural-field experiments (12,901 videos) show robustness to noisy tabular inputs and substantial improvements over conventional practices (e.g., MAE of

g for spring-balance readings). The approach offers a scalable, geo-tagged, contactless solution to monitor newborn growth and inform timely interventions, with potential for large-scale impact in public health programs.

Abstract

Paper Structure (52 sections, 8 equations, 7 figures, 13 tables)

This paper contains 52 sections, 8 equations, 7 figures, 13 tables.

Introduction
Related Work
Computer vision for newborns
Pose estimation
3D parametric models for adults and infants.
Tabular methods.
Method
Video-based Anthropometry
How to record a video?
Video-based weight estimation.
Multi-task Learning
Anthropometric measurements.
Visual prediction tasks.
Augmenting with Tabular Information
Experiments
...and 37 more sections

Figures (7)

Figure 1: Illustration contrasting traditional approaches (a-c) for newborn anthropometry to what our proposed solution (d) enables. (a) A measuring tape is used to measure head and chest circumference. (b) An infantometer is used to capture length. (c) The newborn is suspended from a cloth and hooked up to a spring balance to measure weight. (d) Our proposed solution replaces all the above tasks and only requires the data collector to take a short video with a low-cost smartphone.
Figure 2: Overview of the proposed approach. Input video frames are sub-sampled and processed using a CNN and fused using a pooling module. Tabular data is normalized between $[0, 1]$ and concatenated to this video representation. We use independent MLP regressors to predict anthropometry measures: weight, length, head circumference, and chest circumference. Additionally, we introduce two proxy tasks only used during training: newborn pixel segmentation predicted through an FCN head and keypoint estimation through a simple MLP.
Figure 3: Left: Weight distribution for the training set. Middle: Impact of varying the number of frames $N$ during evaluation on the validation set. For training, we use $N{=}25$. The model used here is NurtureNet, that augments video information with tabular data and uses proxy tasks of baby segmentation mask and keypoints. Right: Effect on weight MAE on the validation set when adding noise sampled from a uniform distribution to the birth weight for NurtureNet models.
Figure 4: Scatter plot showing predicted weight vs. ground-truth weight for NurtureNet on the test set. The best fit line (least squares) lies close to the $y {=} x$ diagonal, indicating the goodness of our model. $R^2$ is the coefficient of determination and $PCC$ is the Pearson correlation coefficient.
Figure 5: Video recording process followed by health workers to capture the newborn from multiple viewing angles.
...and 2 more figures

NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

TL;DR

Abstract

NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

Authors

TL;DR

Abstract

Table of Contents

Figures (7)