Large Language Models for Geolocation Extraction in Humanitarian Crisis Response
G. Cafferata, T. Demarco, K. Kalimeri, Y. Mejova, M. G. Beiró
TL;DR
This work investigates reducing geographic and socioeconomic biases in humanitarian geolocation extraction by coupling few-shot LLM-based NER with a context-aware geolocation agent that leverages GeoNames and Pelias. The proposed four-step pipeline—document preprocessing, NER tagging, post-processing, and a LangChain-powered geolocator—achieves higher accuracy and more uniform performance across regions than traditional baselines, aided by an improved HumSet dataset with refined literal toponym annotations. Key findings show that LLM-based NER, especially with Markdown prompts, delivers strong recall, while the agent-based geolocator markedly improves exact and distance-based geocoding accuracy and reduces fairness disparities. The work argues for integrating responsible AI principles, prompt design, and continuous auditing to advance inclusive, transparent geospatial data systems for global crisis response, moving toward the goal of leaving no place behind in crisis analytics.
Abstract
Humanitarian crises demand timely and accurate geographic information to inform effective response efforts. Yet, automated systems that extract locations from text often reproduce existing geographic and socioeconomic biases, leading to uneven visibility of crisis-affected regions. This paper investigates whether Large Language Models (LLMs) can address these geographic disparities in extracting location information from humanitarian documents. We introduce a two-step framework that combines few-shot LLM-based named entity recognition with an agent-based geocoding module that leverages context to resolve ambiguous toponyms. We benchmark our approach against state-of-the-art pretrained and rule-based systems using both accuracy and fairness metrics across geographic and socioeconomic dimensions. Our evaluation uses an extended version of the HumSet dataset with refined literal toponym annotations. Results show that LLM-based methods substantially improve both the precision and fairness of geolocation extraction from humanitarian texts, particularly for underrepresented regions. By bridging advances in LLM reasoning with principles of responsible and inclusive AI, this work contributes to more equitable geospatial data systems for humanitarian response, advancing the goal of leaving no place behind in crisis analytics.
