Table of Contents
Fetching ...

RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

Jonathan Lin, Aditya Joshi, Hye-young Paik, Tri Dung Doung, Deepti Gurdasani

TL;DR

RACCOON addresses geocoding of incident coordinates from news articles by using a Retrieval-Augmented Generation framework that links a GeoNames gazetteer-based retriever to a pre-trained LLM for coordinate generation. The method introduces country-assisted retrieval, a population-based candidate re-ranking, and context-rich prompts to enable accurate location resolution when topographic ambiguity exists. Evaluations on GeoVirus, GeoWebNews, and LGL show RACCOON improves precision and coverage relative to baselines and a non-RAG prompt approach, though challenges remain in recall and population bias, particularly on locally concentrated datasets. The work demonstrates the potential of RAG in geospatial information extraction and suggests directions for future enhancement with open-source LLMs, fine-tuning, and reduced dependency on large gazetteers.

Abstract

Geocoding involves automatic extraction of location coordinates of incidents reported in news articles, and can be used for epidemic intelligence or disaster management. This paper introduces Retrieval-Augmented Coordinate Capture Of Online News articles (RACCOON), an open-source geocoding approach that extracts geolocations from news articles. RACCOON uses a retrieval-augmented generation (RAG) approach where candidate locations and associated information are retrieved in the form of context from a location database, and a prompt containing the retrieved context, location mentions and news articles is fed to an LLM to generate the location coordinates. Our evaluation on three datasets, two underlying LLMs, three baselines and several ablation tests based on the components of RACCOON demonstrate the utility of RACCOON. To the best of our knowledge, RACCOON is the first RAG-based approach for geocoding using pre-trained LLMs.

RACCOON: A Retrieval-Augmented Generation Approach for Location Coordinate Capture from News Articles

TL;DR

RACCOON addresses geocoding of incident coordinates from news articles by using a Retrieval-Augmented Generation framework that links a GeoNames gazetteer-based retriever to a pre-trained LLM for coordinate generation. The method introduces country-assisted retrieval, a population-based candidate re-ranking, and context-rich prompts to enable accurate location resolution when topographic ambiguity exists. Evaluations on GeoVirus, GeoWebNews, and LGL show RACCOON improves precision and coverage relative to baselines and a non-RAG prompt approach, though challenges remain in recall and population bias, particularly on locally concentrated datasets. The work demonstrates the potential of RAG in geospatial information extraction and suggests directions for future enhancement with open-source LLMs, fine-tuning, and reduced dependency on large gazetteers.

Abstract

Geocoding involves automatic extraction of location coordinates of incidents reported in news articles, and can be used for epidemic intelligence or disaster management. This paper introduces Retrieval-Augmented Coordinate Capture Of Online News articles (RACCOON), an open-source geocoding approach that extracts geolocations from news articles. RACCOON uses a retrieval-augmented generation (RAG) approach where candidate locations and associated information are retrieved in the form of context from a location database, and a prompt containing the retrieved context, location mentions and news articles is fed to an LLM to generate the location coordinates. Our evaluation on three datasets, two underlying LLMs, three baselines and several ablation tests based on the components of RACCOON demonstrate the utility of RACCOON. To the best of our knowledge, RACCOON is the first RAG-based approach for geocoding using pre-trained LLMs.
Paper Structure (11 sections, 1 equation, 2 figures, 2 tables)

This paper contains 11 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Architecture of RACCOON.
  • Figure 2: Relationship between population size and accuracy @161km for GeoWebNews and LGL datasets