Table of Contents
Fetching ...

Use of Boosting Algorithms in Household-Level Poverty Measurement: A Machine Learning Approach to Predict and Classify Household Wealth Quintiles in the Philippines

Erika Lynet Salvador

TL;DR

This study addresses the challenge of measuring household poverty levels in the Philippines by comparing five boosting algorithms (AdaBoost, CatBoost, GBM, LightGBM, XGBoost) on 2022 DHS data, incorporating SMOTE for imbalance, and applying feature selection and standard classification metrics. CatBoost achieves the highest overall accuracy (90.93%), with XGBoost, GBM, and LightGBM performing closely, and AdaBoost lagging significantly. Class-wise AUC-ROC analyses show strong discriminative performance for CatBoost, GBM, LightGBM, and XGBoost across poverty classes, while AdaBoost underperforms, especially for the Poorest and Poorer classes. The work demonstrates the potential of ML boosting to inform poverty prediction and targeted policy interventions in the Philippines, and it suggests enriching datasets (e.g., GPS data, night-light data) to further improve predictive accuracy and policy utility.

Abstract

This study assessed the effectiveness of machine learning models in predicting poverty levels in the Philippines using five boosting algorithms: Adaptive Boosting (AdaBoost), CatBoosting (CatBoost), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). CatBoost emerged as the superior model and achieved the highest scores across accuracy, precision, recall, and F1-score at 91 percent, while XGBoost and GBM followed closely with 89 percent and 88 percent respectively. Additionally, the research examined the computational efficiency of these models to analyze the balance between training time, testing speed, and model size factors crucial for real-world applications. Despite its longer training duration, CatBoost demonstrated high testing efficiency. These results indicate that machine learning can aid in poverty prediction and in the development of targeted policy interventions. Future studies should focus on incorporating a wider variety of data to enhance the predictive accuracy and policy utility of these models.

Use of Boosting Algorithms in Household-Level Poverty Measurement: A Machine Learning Approach to Predict and Classify Household Wealth Quintiles in the Philippines

TL;DR

This study addresses the challenge of measuring household poverty levels in the Philippines by comparing five boosting algorithms (AdaBoost, CatBoost, GBM, LightGBM, XGBoost) on 2022 DHS data, incorporating SMOTE for imbalance, and applying feature selection and standard classification metrics. CatBoost achieves the highest overall accuracy (90.93%), with XGBoost, GBM, and LightGBM performing closely, and AdaBoost lagging significantly. Class-wise AUC-ROC analyses show strong discriminative performance for CatBoost, GBM, LightGBM, and XGBoost across poverty classes, while AdaBoost underperforms, especially for the Poorest and Poorer classes. The work demonstrates the potential of ML boosting to inform poverty prediction and targeted policy interventions in the Philippines, and it suggests enriching datasets (e.g., GPS data, night-light data) to further improve predictive accuracy and policy utility.

Abstract

This study assessed the effectiveness of machine learning models in predicting poverty levels in the Philippines using five boosting algorithms: Adaptive Boosting (AdaBoost), CatBoosting (CatBoost), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). CatBoost emerged as the superior model and achieved the highest scores across accuracy, precision, recall, and F1-score at 91 percent, while XGBoost and GBM followed closely with 89 percent and 88 percent respectively. Additionally, the research examined the computational efficiency of these models to analyze the balance between training time, testing speed, and model size factors crucial for real-world applications. Despite its longer training duration, CatBoost demonstrated high testing efficiency. These results indicate that machine learning can aid in poverty prediction and in the development of targeted policy interventions. Future studies should focus on incorporating a wider variety of data to enhance the predictive accuracy and policy utility of these models.
Paper Structure (13 sections, 4 equations, 2 figures)

This paper contains 13 sections, 4 equations, 2 figures.

Figures (2)

  • Figure 1: Distribution of Missing Values across Features. Blue Line = 3,050
  • Figure 2: Figures 2.1-2.5 (From Left to Right): Confusion Matrices for AdaBoost (Fig 2.1), CatBoost (Fig 2.2), GBM (Fig 2.3), LightGBM (Fig 2.4), and XGBoost (Fig 2.5).