Table of Contents
Fetching ...

GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

Sungwon Han, Donghyun Ahn, Seungeon Lee, Minhyuk Song, Sungwon Park, Sangyoon Park, Jihee Kim, Meeyoung Cha

TL;DR

GeoSEE presents a universal LLM-based pipeline to estimate regional socio-economic indicators from multimodal proxies. It uses a two-stage process: first, LLM-guided module selection to extract task-relevant information, then in-context learning on serialized regional descriptions to predict indicators in unsupervised and low-shot settings. Across multiple countries and data-scarce scenarios, GeoSEE outperforms baselines and demonstrates transferability and potential for detecting temporal changes, with ablations confirming the value of module selection and demonstration strategies. The approach offers a scalable, extensible framework for subnational analytics and SDG monitoring where traditional surveys are limited by cost and access.

Abstract

Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Presented with a diverse set of information modules, including those pre-constructed from satellite imagery, GeoSEE selects which modules to use in estimation, for each indicator and country. This selection is guided by the LLM's prior socio-geographic knowledge, which functions similarly to the insights of a domain expert. The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts. Comprehensive evaluation across countries at various stages of development reveals that our method outperforms other predictive models in both unsupervised and low-shot contexts. This reliable performance under data-scarce setting in under-developed or developing countries, combined with its cost-effectiveness, underscores its potential to continuously support and monitor the progress of Sustainable Development Goals, such as poverty alleviation and equitable growth, on a global scale.

GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

TL;DR

GeoSEE presents a universal LLM-based pipeline to estimate regional socio-economic indicators from multimodal proxies. It uses a two-stage process: first, LLM-guided module selection to extract task-relevant information, then in-context learning on serialized regional descriptions to predict indicators in unsupervised and low-shot settings. Across multiple countries and data-scarce scenarios, GeoSEE outperforms baselines and demonstrates transferability and potential for detecting temporal changes, with ablations confirming the value of module selection and demonstration strategies. The approach offers a scalable, extensible framework for subnational analytics and SDG monitoring where traditional surveys are limited by cost and access.

Abstract

Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Presented with a diverse set of information modules, including those pre-constructed from satellite imagery, GeoSEE selects which modules to use in estimation, for each indicator and country. This selection is guided by the LLM's prior socio-geographic knowledge, which functions similarly to the insights of a domain expert. The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts. Comprehensive evaluation across countries at various stages of development reveals that our method outperforms other predictive models in both unsupervised and low-shot contexts. This reliable performance under data-scarce setting in under-developed or developing countries, combined with its cost-effectiveness, underscores its potential to continuously support and monitor the progress of Sustainable Development Goals, such as poverty alleviation and equitable growth, on a global scale.
Paper Structure (43 sections, 2 equations, 5 figures, 13 tables, 1 algorithm)

This paper contains 43 sections, 2 equations, 5 figures, 13 tables, 1 algorithm.

Figures (5)

  • Figure 1: Prompt for module selection in GeoSEE. An example of a full prompt is shown in Appendix \ref{['sec:appendix-A']}.
  • Figure 2: (a) Averaged Pearson correlation over four indicators (POP, ELP, HER, LPR) for each country in a 5-shot setting shows that our model's module improves LLM inference beyond the prior knowledge. (b) Averaged Pearson correlation transferring from a country (i.e., source country) to another country (i.e., target country). Rows represent the source country, and columns represent the target country; the diagonal line indicates evaluations in an unsupervised setting without transfer.
  • Figure 3: Qualitative analysis on predicting changes between two timestamps, providing example satellite images, segmentation maps and paragraphs. The analysis is done over the Hwaseong City area in South Korea. Differences captured by the modules constructed from satellite imagery lead to different estimation results, which show positive growth in the area, consistent with ground-truths. (Note that colored texts and triangle symbols are illustrations here only, and not given to the LLM.)
  • Figure 4: Example prompt for module selection of GeoSEE to predict population in Malawi.
  • Figure 5: Example prompt for inferring population in Malawi.