Large Language Models for Geolocation Extraction in Humanitarian Crisis Response

G. Cafferata; T. Demarco; K. Kalimeri; Y. Mejova; M. G. Beiró

Large Language Models for Geolocation Extraction in Humanitarian Crisis Response

G. Cafferata, T. Demarco, K. Kalimeri, Y. Mejova, M. G. Beiró

TL;DR

This work investigates reducing geographic and socioeconomic biases in humanitarian geolocation extraction by coupling few-shot LLM-based NER with a context-aware geolocation agent that leverages GeoNames and Pelias. The proposed four-step pipeline—document preprocessing, NER tagging, post-processing, and a LangChain-powered geolocator—achieves higher accuracy and more uniform performance across regions than traditional baselines, aided by an improved HumSet dataset with refined literal toponym annotations. Key findings show that LLM-based NER, especially with Markdown prompts, delivers strong recall, while the agent-based geolocator markedly improves exact and distance-based geocoding accuracy and reduces fairness disparities. The work argues for integrating responsible AI principles, prompt design, and continuous auditing to advance inclusive, transparent geospatial data systems for global crisis response, moving toward the goal of leaving no place behind in crisis analytics.

Abstract

Humanitarian crises demand timely and accurate geographic information to inform effective response efforts. Yet, automated systems that extract locations from text often reproduce existing geographic and socioeconomic biases, leading to uneven visibility of crisis-affected regions. This paper investigates whether Large Language Models (LLMs) can address these geographic disparities in extracting location information from humanitarian documents. We introduce a two-step framework that combines few-shot LLM-based named entity recognition with an agent-based geocoding module that leverages context to resolve ambiguous toponyms. We benchmark our approach against state-of-the-art pretrained and rule-based systems using both accuracy and fairness metrics across geographic and socioeconomic dimensions. Our evaluation uses an extended version of the HumSet dataset with refined literal toponym annotations. Results show that LLM-based methods substantially improve both the precision and fairness of geolocation extraction from humanitarian texts, particularly for underrepresented regions. By bridging advances in LLM reasoning with principles of responsible and inclusive AI, this work contributes to more equitable geospatial data systems for humanitarian response, advancing the goal of leaving no place behind in crisis analytics.

Large Language Models for Geolocation Extraction in Humanitarian Crisis Response

TL;DR

Abstract

Paper Structure (22 sections, 3 figures, 6 tables)

This paper contains 22 sections, 3 figures, 6 tables.

Introduction
Related Work
Methodology
Document preprocessing
LLM-based NER tagging
Post-processing
Geolocation Agent
Dataset
Annotations' improvement
Experiments
NER tagging performance
Geolocation performance.
Fairness Assessment.
Results
NER Tagging Performance
...and 7 more sections

Figures (3)

Figure 1: Overview of our LLM-based geolocation extraction pipeline. Humanitarian documents are chunked and sent to a few-shot LLM for location NER tagging, followed by postprocessing that aligns and merges literal toponyms. An agent-based geocoder then uses contextual reasoning and GeoNames queries to resolve each toponym to coordinates, enabling us to evaluate both accuracy and fairness of geolocation across geographic and socioeconomic groups.
Figure 2: Precision–recall trade-offs for the evaluated LLM-based NER taggers under JSON and Markdown output formats. JSON configurations generally yield higher precision, while Markdown formats achieve higher recall, reflecting the tendency of Markdown prompts to capture more toponyms at the cost of increased false positives.
Figure 3: Geographic distribution of geocoding errors for the rule-based baseline (left) and our agent-based geolocator (right). Each point represents a toponym from the annotated evaluation set, plotted at its gold-standard location and colored by distance (in km) between the system’s predicted coordinates and the ground truth. Darker colors indicate larger errors. The rule-based method exhibits frequent long-distance misclassifications, including cross-continent mismatches, whereas the agent-based geolocator substantially reduces large-error cases and achieves more geographically consistent performance across regions.

Large Language Models for Geolocation Extraction in Humanitarian Crisis Response

TL;DR

Abstract

Large Language Models for Geolocation Extraction in Humanitarian Crisis Response

Authors

TL;DR

Abstract

Table of Contents

Figures (3)