Table of Contents
Fetching ...

Earthquake Response Analysis with AI

Deep Patel, Panthadeep Bhattacharjee, Amit Reza, Priodyuti Pradhan

TL;DR

The paper tackles real-time earthquake response by exploiting Twitter data to infer affected locations and generate severity maps via NLP. It introduces a transfer-learning NER pipeline pre-trained on earthquake-prone locations from GeoNames and disaster keywords, then fine-tuned on synthetic and real tweets to extract GPE and DISASTER entities. Experiment results from a Japan 2024 case show high tagging accuracy (up to ~96%) and a strong correlation between Twitter-derived severity maps and seismic epicenter data, highlighting value for responders. The work discusses limitations, including data availability, misalignment between geo-tags and content, and scalability, with plans to extend to larger models and multilingual, real-time processing via LLMs and to release code on GitHub.

Abstract

A timely and effective response is crucial to minimize damage and save lives during natural disasters like earthquakes. Microblogging platforms, particularly Twitter, have emerged as valuable real-time information sources for such events. This work explores the potential of leveraging Twitter data for earthquake response analysis. We develop a machine learning (ML) framework by incorporating natural language processing (NLP) techniques to extract and analyze relevant information from tweets posted during earthquake events. The approach primarily focuses on extracting location data from tweets to identify affected areas, generating severity maps, and utilizing WebGIS to display valuable information. The insights gained from this analysis can aid emergency responders, government agencies, humanitarian organizations, and NGOs in enhancing their disaster response strategies and facilitating more efficient resource allocation during earthquake events.

Earthquake Response Analysis with AI

TL;DR

The paper tackles real-time earthquake response by exploiting Twitter data to infer affected locations and generate severity maps via NLP. It introduces a transfer-learning NER pipeline pre-trained on earthquake-prone locations from GeoNames and disaster keywords, then fine-tuned on synthetic and real tweets to extract GPE and DISASTER entities. Experiment results from a Japan 2024 case show high tagging accuracy (up to ~96%) and a strong correlation between Twitter-derived severity maps and seismic epicenter data, highlighting value for responders. The work discusses limitations, including data availability, misalignment between geo-tags and content, and scalability, with plans to extend to larger models and multilingual, real-time processing via LLMs and to release code on GitHub.

Abstract

A timely and effective response is crucial to minimize damage and save lives during natural disasters like earthquakes. Microblogging platforms, particularly Twitter, have emerged as valuable real-time information sources for such events. This work explores the potential of leveraging Twitter data for earthquake response analysis. We develop a machine learning (ML) framework by incorporating natural language processing (NLP) techniques to extract and analyze relevant information from tweets posted during earthquake events. The approach primarily focuses on extracting location data from tweets to identify affected areas, generating severity maps, and utilizing WebGIS to display valuable information. The insights gained from this analysis can aid emergency responders, government agencies, humanitarian organizations, and NGOs in enhancing their disaster response strategies and facilitating more efficient resource allocation during earthquake events.

Paper Structure

This paper contains 7 sections, 1 equation, 5 figures, 3 tables, 4 algorithms.

Figures (5)

  • Figure 1: Our framework contains the training and testing phase of the AI models. We collect the tweets related to the Kahramanmaraş Earthquake (Turkey-Syria, 2023) for the training data sets from Kaggle. We replaced the locations mentioned in these tweets with all the known locations in Japan. Furthermore, we tag Japan's locations as GPE and disaster-related keywords as DISASTER, leading to the final training data set. Now, we use the pre-trained Name Entity Recognition (NER) model and train it on the training dataset. During the test time, real tweets can be extracted from Twitter, preprocessed, and passed through the pre-trained model for location extraction. Then, they can be mapped to the longitude and latitude for the severity map. Finally, we can compare the severity map with the epicenter map.
  • Figure 2: It portrays earthquakes worldwide with frequencies over the past $125$ years, with earthquake magnitude on the Richter scale greater than $7.0$. There are a total of $63$ countries.
  • Figure 3: Illustrate the Algorithm \ref{['algo_preprocessing']} on a sample tweet, from collecting raw tweets until we get the cleaned tweet. Each highlighted portion in the text is the target sequence of the respective steps of the process. Non-UTF encoded characters, such as emojis that may not be visible during reading but are still present in the text, are also detected in the initial execution step.
  • Figure 4: (a) Collected tweets during the Japan earthquake on 1st January $2024$. (b) We generate a severity map from locations extracted from disaster tweets from our model during the testing phase. (c) We also plot the epicenter map from seismic data collected from the US Geological Survey (USGS) Earthquake Catalog USGS.
  • Figure 5: NER model on the second dataset. (a) epochs vs. training loss. The curve represents the model's convergence during transfer learning; the loss decreases as the number of epochs increases. (b) The confusion matrix represents the performance of the trained NER model. The model classifies each word of the test dataset into three different entities: GPE (Geopolitical Entity), DISASTER, and O (Other). The 'Other' category (O) includes words that do not fall under the GPE or DISASTER entity types. These words represent general terms, non-entity references, or irrelevant information in the context of the classification task.