Gated-Attention Feature-Fusion Based Framework for Poverty Prediction
Muhammad Umer Ramzan, Wahab Khaddim, Muhammad Ehsan Rana, Usman Ali, Manohar Ali, Fiaz ul Hassan, Fatima Mehmood
TL;DR
This work tackles the challenge of estimating poverty from satellite imagery by extending ResNet50 with a Gated-Attention Feature-Fusion Module (GAFM) that jointly leverages global and local visual cues. The framework employs a two-phase transfer-learning approach, first predicting nightlight proxies via $L_{MSE}$ and then income via a Vanilla NN trained on high-level CNN features with $L_{MAE}$, using SE blocks and a GFFM to dynamically fuse feature streams. Empirical results show a substantial $R^2$ of $0.75$, outperforming several prior methods by large margins and demonstrating improved targeting of impoverished areas through attention-guided feature selection. The work holds practical significance for timely, scalable poverty mapping in data-scarce regions, with future directions including regional generalization and further architectural refinements.
Abstract
This research paper addresses the significant challenge of accurately estimating poverty levels using deep learning, particularly in developing regions where traditional methods like household surveys are often costly, infrequent, and quickly become outdated. To address these issues, we propose a state-of-the-art Convolutional Neural Network (CNN) architecture, extending the ResNet50 model by incorporating a Gated-Attention Feature-Fusion Module (GAFM). Our architecture is designed to improve the model's ability to capture and combine both global and local features from satellite images, leading to more accurate poverty estimates. The model achieves a 75% R2 score, significantly outperforming existing leading methods in poverty mapping. This improvement is due to the model's capacity to focus on and refine the most relevant features, filtering out unnecessary data, which makes it a powerful tool for remote sensing and poverty estimation.
