Table of Contents
Fetching ...

Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs

Leonardo Nizzoli, Marco Avvenuti, Maurizio Tesconi, Stefano Cresci

TL;DR

This work introduces Geo-Semantic-Parsing (GSP), a geoparsing framework that leverages semantic annotations and knowledge-graph traversal to enrich free-text location mentions with coordinates. It combines three steps—semantic annotation, information expansion across a knowledge graph, and a regression-based best-candidate selection—to achieve superior recall while maintaining precision, evidenced by a reported $F1=0.665$ on NEEL16, outperforming baselines and state-of-the-art geoparsers. The approach benefits from diverse expansion strategies, robust feature engineering, and the use of graph/node embeddings, yielding strong performance across world-scale datasets and location granularity levels. GSP demonstrates practical potential for real-time, global geoparsing in crisis mapping and other OSN-driven geospatial applications, and points to future work on combining strategies and end-to-end embedding-based methods.

Abstract

Online social networks convey rich information about geospatial facets of reality. However in most cases, geographic information is not explicit and structured, thus preventing its exploitation in real-time applications. We address this limitation by introducing a novel geoparsing and geotagging technique called Geo-Semantic-Parsing (GSP). GSP identifies location references in free text and extracts the corresponding geographic coordinates. To reach this goal, we employ a semantic annotator to identify relevant portions of the input text and to link them to the corresponding entity in a knowledge graph. Then, we devise and experiment with several efficient strategies for traversing the knowledge graph, thus expanding the available set of information for the geoparsing task. Finally, we exploit all available information for learning a regression model that selects the best entity with which to geotag the input text. We evaluate GSP on a well-known reference dataset including almost 10k event-related tweets, achieving $F1=0.66$. We extensively compare our results with those of 2 baselines and 3 state-of-the-art geoparsing techniques, achieving the best performance. On the same dataset, competitors obtain $F1 \leq 0.55$. We conclude by providing in-depth analyses of our results, showing that the overall superior performance of GSP is mainly due to a large improvement in recall, with respect to existing techniques.

Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs

TL;DR

This work introduces Geo-Semantic-Parsing (GSP), a geoparsing framework that leverages semantic annotations and knowledge-graph traversal to enrich free-text location mentions with coordinates. It combines three steps—semantic annotation, information expansion across a knowledge graph, and a regression-based best-candidate selection—to achieve superior recall while maintaining precision, evidenced by a reported on NEEL16, outperforming baselines and state-of-the-art geoparsers. The approach benefits from diverse expansion strategies, robust feature engineering, and the use of graph/node embeddings, yielding strong performance across world-scale datasets and location granularity levels. GSP demonstrates practical potential for real-time, global geoparsing in crisis mapping and other OSN-driven geospatial applications, and points to future work on combining strategies and end-to-end embedding-based methods.

Abstract

Online social networks convey rich information about geospatial facets of reality. However in most cases, geographic information is not explicit and structured, thus preventing its exploitation in real-time applications. We address this limitation by introducing a novel geoparsing and geotagging technique called Geo-Semantic-Parsing (GSP). GSP identifies location references in free text and extracts the corresponding geographic coordinates. To reach this goal, we employ a semantic annotator to identify relevant portions of the input text and to link them to the corresponding entity in a knowledge graph. Then, we devise and experiment with several efficient strategies for traversing the knowledge graph, thus expanding the available set of information for the geoparsing task. Finally, we exploit all available information for learning a regression model that selects the best entity with which to geotag the input text. We evaluate GSP on a well-known reference dataset including almost 10k event-related tweets, achieving . We extensively compare our results with those of 2 baselines and 3 state-of-the-art geoparsing techniques, achieving the best performance. On the same dataset, competitors obtain . We conclude by providing in-depth analyses of our results, showing that the overall superior performance of GSP is mainly due to a large improvement in recall, with respect to existing techniques.

Paper Structure

This paper contains 51 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Logical overview of the 3 main steps applied by GSP to the input document $t_i$. Semantic annotation (step 1) links a relevant token (anchor) to an entity (red-colored node) within a reference knowledge graph. Expansion (step 2) identifies related entities (blue-colored nodes) that possibly convey useful geographic information. Selection (step 3) picks the best entity (green-colored node) to geotag the anchor.
  • Figure 2: Difference between vertical and horizontal expansion. Vertical expansion traverses semantically equivalent entities (purple-colored) across different knowledge graphs, whereas horizontal expansion considers those nodes that are most related (blue-colored) to the starting one, within the reference knowledge graph.
  • Figure 3: Toy example showing the nodes retrieved by the different expansion strategies on a small knowledge graph, for expansion size $L=2$.
  • Figure 4: Performance evaluation of the proposed expansion strategies, when applied individually and jointly, as a function of the expansion size $L$.
  • Figure 5: Assignment of regression labels to candidates retrieved by an expansion strategy of size $L=5$.
  • ...and 4 more figures