GeoSEE: Regional Socio-Economic Estimation With a Large Language Model
Sungwon Han, Donghyun Ahn, Seungeon Lee, Minhyuk Song, Sungwon Park, Sangyoon Park, Jihee Kim, Meeyoung Cha
TL;DR
GeoSEE presents a universal LLM-based pipeline to estimate regional socio-economic indicators from multimodal proxies. It uses a two-stage process: first, LLM-guided module selection to extract task-relevant information, then in-context learning on serialized regional descriptions to predict indicators in unsupervised and low-shot settings. Across multiple countries and data-scarce scenarios, GeoSEE outperforms baselines and demonstrates transferability and potential for detecting temporal changes, with ablations confirming the value of module selection and demonstration strategies. The approach offers a scalable, extensible framework for subnational analytics and SDG monitoring where traditional surveys are limited by cost and access.
Abstract
Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Presented with a diverse set of information modules, including those pre-constructed from satellite imagery, GeoSEE selects which modules to use in estimation, for each indicator and country. This selection is guided by the LLM's prior socio-geographic knowledge, which functions similarly to the insights of a domain expert. The system then computes target indicators via in-context learning after aggregating results from selected modules in the format of natural language-based texts. Comprehensive evaluation across countries at various stages of development reveals that our method outperforms other predictive models in both unsupervised and low-shot contexts. This reliable performance under data-scarce setting in under-developed or developing countries, combined with its cost-effectiveness, underscores its potential to continuously support and monitor the progress of Sustainable Development Goals, such as poverty alleviation and equitable growth, on a global scale.
