Table of Contents
Fetching ...

SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness

Tanmay Parekh, Jeffrey Kwan, Jiarui Yu, Sparsh Johri, Hyosang Ahn, Sreya Muppalla, Kai-Wei Chang, Wei Wang, Nanyun Peng

TL;DR

This work introduces the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for any disease and language, and exploits its argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring.

Abstract

Social media is often the first place where communities discuss the latest societal trends. Prior works have utilized this platform to extract epidemic-related information (e.g. infections, preventive measures) to provide early warnings for epidemic prediction. However, these works only focused on English posts, while epidemics can occur anywhere in the world, and early discussions are often in the local, non-English languages. In this work, we introduce the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for a wide range of diseases and languages. To this end, we extend a previous epidemic ontology with 20 argument roles; and curate our multilingual EE dataset SPEED++ comprising 5.1K tweets in four languages for four diseases. Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models (i.e., training only on English COVID data) utilizing multilingual pre-training and show their efficacy in extracting epidemic-related events for 65 diverse languages across different diseases. Experiments demonstrate that our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 (3 weeks before global discussions) from Chinese Weibo posts without any training in Chinese. Furthermore, we exploit our framework's argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring. Overall, we lay a strong foundation for multilingual epidemic preparedness.

SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness

TL;DR

This work introduces the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for any disease and language, and exploits its argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring.

Abstract

Social media is often the first place where communities discuss the latest societal trends. Prior works have utilized this platform to extract epidemic-related information (e.g. infections, preventive measures) to provide early warnings for epidemic prediction. However, these works only focused on English posts, while epidemics can occur anywhere in the world, and early discussions are often in the local, non-English languages. In this work, we introduce the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for a wide range of diseases and languages. To this end, we extend a previous epidemic ontology with 20 argument roles; and curate our multilingual EE dataset SPEED++ comprising 5.1K tweets in four languages for four diseases. Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models (i.e., training only on English COVID data) utilizing multilingual pre-training and show their efficacy in extracting epidemic-related events for 65 diverse languages across different diseases. Experiments demonstrate that our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 (3 weeks before global discussions) from Chinese Weibo posts without any training in Chinese. Furthermore, we exploit our framework's argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring. Overall, we lay a strong foundation for multilingual epidemic preparedness.

Paper Structure

This paper contains 50 sections, 14 figures, 31 tables.

Figures (14)

  • Figure 1: Zero-shot multilingual epidemic prediction in Chinese for COVID-19 pandemic. (Top) Number of epidemic events extracted in Dec-Jan 2020. Arrows indicate SPEED++ epidemic warnings. (Bottom) SPEED++ warning with respect to the general timeline of major moments of the COVID-19 pandemic.
  • Figure 2: Illustration of Event Extraction for epidemic-related events Infect and Control. Corresponding arguments and their roles are marked in dotted boxes - that are absent in the SPEED speed dataset.
  • Figure 3: Overview of the data creation process. Majorly, we expand the ontology with argument roles, preprocess and filter the multilingual data, and annotate them using bilingual experts to create SPEED++.
  • Figure 4: Distribution of the number of arguments (# Args) per sentence for SPEED++ relative to other datasets ACE, ERE, and MEE.
  • Figure 5: Number of extracted events plotted against the number of reported cases for each country. Both of them are in log scale.
  • ...and 9 more figures