Table of Contents
Fetching ...

Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study

Md Muhtasim Munif Fahim, Md Rezaul Karim

TL;DR

This study tackles look-ahead bias in child-mortality prediction by using strictly temporally separated BDHS cohorts from Bangladesh (training: 2011–2014, validation: 2017, test: 2022). It combines domain-informed feature engineering with Neural Architecture Search (NAS) to yield a simple yet robust single-layer neural network (64 units, ELU, 30% dropout, batch normalization) achieving AUROC $0.766$ on the 2022 test set, outperforming XGBoost and logistic regression. A notable finding is an equity gradient where Pearson $r=-0.62$ links division wealth to discrimination, with higher performance in poorer divisions (AUC $>0.72$) and lower in wealthier regions; at a 10% screening threshold, NAS identifies $42.3 ext{%}$ of future deaths versus $41.1 ext{%}$ for the next-best model, translating to roughly 1300 additional at-risk children identified annually. Calibration is substantially improved by Platt scaling (Brier $0.029$), SHAP confirms epidemiologically plausible risk factors, and results support region-tailored interventions; future work should extend to multi-country validation and causal inference to estimate intervention effects.

Abstract

The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall "Socioeconomic Predictive Gradient," with a positive correlation between regional poverty level (r = -0.62) and the algorithm's AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.

Child Mortality Prediction in Bangladesh: A Decade-Long Validation Study

TL;DR

This study tackles look-ahead bias in child-mortality prediction by using strictly temporally separated BDHS cohorts from Bangladesh (training: 2011–2014, validation: 2017, test: 2022). It combines domain-informed feature engineering with Neural Architecture Search (NAS) to yield a simple yet robust single-layer neural network (64 units, ELU, 30% dropout, batch normalization) achieving AUROC on the 2022 test set, outperforming XGBoost and logistic regression. A notable finding is an equity gradient where Pearson links division wealth to discrimination, with higher performance in poorer divisions (AUC ) and lower in wealthier regions; at a 10% screening threshold, NAS identifies of future deaths versus for the next-best model, translating to roughly 1300 additional at-risk children identified annually. Calibration is substantially improved by Platt scaling (Brier ), SHAP confirms epidemiologically plausible risk factors, and results support region-tailored interventions; future work should extend to multi-country validation and causal inference to estimate intervention effects.

Abstract

The predictive machine learning models for child mortality tend to be inaccurate when applied to future populations, since they suffer from look-ahead bias due to the randomization used in cross-validation. The Demographic and Health Surveys (DHS) data from Bangladesh for 2011-2022, with n = 33,962, are used in this paper. We trained the model on (2011-2014) data, validated it on 2017 data, and tested it on 2022 data. Eight years after the initial test of the model, a genetic algorithm-based Neural Architecture Search found a single-layer neural architecture (with 64 units) to be superior to XGBoost (AUROC = 0.76 vs. 0.73; p < 0.01). Additionally, through a detailed fairness audit, we identified an overall "Socioeconomic Predictive Gradient," with a positive correlation between regional poverty level (r = -0.62) and the algorithm's AUC. In addition, we found that the model performed at its highest levels in the least affluent divisions (AUC 0.74) and decreased dramatically in the wealthiest divisions (AUC 0.66). These findings suggest that the model is identifying areas with the greatest need for intervention. Our model would identify approximately 1300 additional at-risk children annually than a Gradient Boosting model when screened at the 10% level and validated using SHAP values and Platt Calibration, and therefore provide a robust, production-ready computational phenotype for targeted maternal and child health interventions.
Paper Structure (13 sections, 5 figures, 3 tables)

This paper contains 13 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Framework for Study Methodology. A detailed workflow that shows the pipeline for predicting child mortality. (A) Data Acquisition: Four demographic surveys (n=33,962) were harmonized. (B) Feature Engineering: 31 predictors based on the literature are derived. (C) Temporal Validation: Strict deployment simulation training using data from 2011 to 2014 and testing using the unobserved 2022 cohort. (D) Model Development: Neural Architecture Search (NAS) is the result of benchmarking statistical and deep learning techniques.
  • Figure 2: Neural Architecture Search Process and Optimal Architecture. (A) Evolutionary Progress: The optimization of the Genetic Algorithm over 15 generations, displaying the average candidate fitness convergence (dashed line) and the identification of the optimal architecture (solid line, AUC=0.766). (B) Optimal Architecture: The neural network topology that was found to maximize generalization on tabular epidemiological data. It has a 64-unit dense layer with ELU activation, 30% dropout, and batch normalization.
  • Figure 3: The Equity Gradient. Division-level wealth (x-axis) and NAS model discrimination (y-axis) have a strong negative correlation (r = -0·62), according to a regional validation scatter plot. In poorer, high-mortality divisions (like Sylhet and Rangpur), where deaths are caused by visible structural deficiencies, the model performs better (AUC $>$ 0·72). In wealthier areas (like Dhaka), where mortality is lower and more frequently caused by random biological factors, performance deteriorates.
  • Figure 4: Model Calibration Curves. Calibration plots comparing predicted probabilities to observed mortality rates. (A) Before Platt scaling showing overconfidence. (B) After Platt scaling showing improved calibration across deciles.
  • Figure 5: SHAP Beeswarm Plot. SHAP values showing feature contributions to mortality risk predictions. Each point represents one child; color indicates feature value (red=high, blue=low). Positive SHAP values increase predicted mortality risk.