Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study
Md Muhtasim Munif Fahim, Md Rezaul Karim
TL;DR
This study tackles look-ahead bias in child-mortality prediction by using strictly temporally separated BDHS cohorts from Bangladesh (training: 2011–2014, validation: 2017, test: 2022). It combines domain-informed feature engineering with Neural Architecture Search (NAS) to yield a simple yet robust single-layer neural network (64 units, ELU, 30% dropout, batch normalization) achieving AUROC $0.766$ on the 2022 test set, outperforming XGBoost and logistic regression. A notable finding is an equity gradient where Pearson $r=-0.62$ links division wealth to discrimination, with higher performance in poorer divisions (AUC $>0.72$) and lower in wealthier regions; at a 10% screening threshold, NAS identifies $42.3 ext{%}$ of future deaths versus $41.1 ext{%}$ for the next-best model, translating to roughly 1300 additional at-risk children identified annually. Calibration is substantially improved by Platt scaling (Brier $0.029$), SHAP confirms epidemiologically plausible risk factors, and results support region-tailored interventions; future work should extend to multi-country validation and causal inference to estimate intervention effects.
Abstract
The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall "Socioeconomic Predictive Gradient," with a positive correlation between regional poverty level (r = -0.62) and the algorithm's AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.
