Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning
Nachiket Kapure, Harsh Joshi, Rajeshwari Mistri, Parul Kumari, Manasi Mali, Seema Purohit, Neha Sharma, Mrityunjoy Panday, Chittaranjan S. Yajnik
TL;DR
This study tackles predicting fetal birth weight (BW) from a constrained, high‑dimensional PMNS dataset by designing a structured ML pipeline that combines advanced imputation (MICE with discrete KNN), extensive supervised feature selection (across filter, wrapper, embedded, and hybrid methods), and ensemble regression models. A comprehensive evaluation across 144 feature‑selector/model combinations identifies a BART‑based feature selector with MICE imputation and Gradient Boosting Regression as the top performer (R^2 ≈ 0.6217, RMSE ≈ 248.64 g), with gestational age at delivery and placental weight emerging as dominant predictors and a notable male–female birth weight difference of about $136.8$ g. The results support the clinical value of data preprocessing and feature selection in high‑dimensional perinatal data and suggest that integrating such predictive analytics into prenatal care could improve risk stratification and decision making, while acknowledging residual errors in extreme cases and the need for broader predictor sets and multi‑center validation.
Abstract
Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies, supervised feature selection techniques, and predictive modeling. Given the constraints of the dataset, the research strengthens the role of data preprocessing in improving the model performance. Among the various methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. Beyond model performance, the study highlights the clinical significance of key physiological determinants, offering insights into maternal and fetal health factors that influence birth weight, offering insights that extend over statistical modeling. By bridging computational intelligence with perinatal research, this work underscores the transformative role of machine learning in enhancing predictive accuracy, refining risk assessment and informing data-driven decision-making in maternal and neonatal care. Keywords: Birth weight prediction, maternal-fetal health, MICE, BART, Gradient Boosting, neonatal outcomes, Clinipredictive.
