Table of Contents
Fetching ...

A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

Erica Cai, Brendan O'Connor

TL;DR

This work addresses zero-shot dyadic event extraction by proposing a fine-grained, multi-stage instruction-following LM pipeline that leverages Monte Carlo sampling to manage nondeterminism and to generate robust trigger synonyms, disambiguate event cues, extract dyadic arguments, and (optionally) detect affiliations to higher-level entities. The approach explicitly separates event detection, argument extraction, and affiliation tasks to improve control and interpretability over purely neural methods, and it demonstrates substantial performance gains over naive zero-shot prompting on ACE and real-world data, while greatly improving efficiency by filtering with candidate triggers. Intrinsic ACE evaluations show the method approaches supervised EE in accuracy, with efficiency gains achieving only a small fraction of the LM queries required by prior zero-shot TE approaches. An extension to affiliation detection for international relations illustrates practical sociopolitical analysis, including a case study on 1980s proxy conflicts, and highlights the method’s flexibility for real-world semantic extraction tasks. Overall, the pipeline provides a scalable, interpretable, and adaptable framework for zero-shot EE with strong potential for broader semantic extraction applications.

Abstract

Current social science efforts automatically populate event databases of "who did what to whom?" tuples, by applying event extraction (EE) to text such as news. The event databases are used to analyze sociopolitical dynamics between actor pairs (dyads) in, e.g., international relations. While most EE methods heavily rely on rules or supervised learning, \emph{zero-shot} event extraction could potentially allow researchers to flexibly specify arbitrary event classes for new research questions. Unfortunately, we find that current zero-shot EE methods, as well as a naive zero-shot approach of simple generative language model (LM) prompting, perform poorly for dyadic event extraction; most suffer from word sense ambiguity, modality sensitivity, and computational inefficiency. We address these challenges with a new fine-grained, multi-stage instruction-following generative LM pipeline, proposing a Monte Carlo approach to deal with, and even take advantage of, nondeterminism of generative outputs. Our pipeline includes explicit stages of linguistic analysis (synonym generation, contextual disambiguation, argument realization, event modality), \textit{improving control and interpretability} compared to purely neural methods. This method outperforms other zero-shot EE approaches, and outperforms naive applications of generative LMs by at least 17 F1 percent points. The pipeline's filtering mechanism greatly improves computational efficiency, allowing it to perform as few as 12% of queries that a previous zero-shot method uses. Finally, we demonstrate our pipeline's application to dyadic international relations analysis.

A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

TL;DR

This work addresses zero-shot dyadic event extraction by proposing a fine-grained, multi-stage instruction-following LM pipeline that leverages Monte Carlo sampling to manage nondeterminism and to generate robust trigger synonyms, disambiguate event cues, extract dyadic arguments, and (optionally) detect affiliations to higher-level entities. The approach explicitly separates event detection, argument extraction, and affiliation tasks to improve control and interpretability over purely neural methods, and it demonstrates substantial performance gains over naive zero-shot prompting on ACE and real-world data, while greatly improving efficiency by filtering with candidate triggers. Intrinsic ACE evaluations show the method approaches supervised EE in accuracy, with efficiency gains achieving only a small fraction of the LM queries required by prior zero-shot TE approaches. An extension to affiliation detection for international relations illustrates practical sociopolitical analysis, including a case study on 1980s proxy conflicts, and highlights the method’s flexibility for real-world semantic extraction tasks. Overall, the pipeline provides a scalable, interpretable, and adaptable framework for zero-shot EE with strong potential for broader semantic extraction applications.

Abstract

Current social science efforts automatically populate event databases of "who did what to whom?" tuples, by applying event extraction (EE) to text such as news. The event databases are used to analyze sociopolitical dynamics between actor pairs (dyads) in, e.g., international relations. While most EE methods heavily rely on rules or supervised learning, \emph{zero-shot} event extraction could potentially allow researchers to flexibly specify arbitrary event classes for new research questions. Unfortunately, we find that current zero-shot EE methods, as well as a naive zero-shot approach of simple generative language model (LM) prompting, perform poorly for dyadic event extraction; most suffer from word sense ambiguity, modality sensitivity, and computational inefficiency. We address these challenges with a new fine-grained, multi-stage instruction-following generative LM pipeline, proposing a Monte Carlo approach to deal with, and even take advantage of, nondeterminism of generative outputs. Our pipeline includes explicit stages of linguistic analysis (synonym generation, contextual disambiguation, argument realization, event modality), \textit{improving control and interpretability} compared to purely neural methods. This method outperforms other zero-shot EE approaches, and outperforms naive applications of generative LMs by at least 17 F1 percent points. The pipeline's filtering mechanism greatly improves computational efficiency, allowing it to perform as few as 12% of queries that a previous zero-shot method uses. Finally, we demonstrate our pipeline's application to dyadic international relations analysis.
Paper Structure (19 sections, 1 equation, 7 figures, 6 tables)

This paper contains 19 sections, 1 equation, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Example of zero-shot dyadic event extraction. Text from New York Times, July 14, 1987 SandhausNYT.
  • Figure 2: This work's multi-stage LM pipeline, where the event class for our running example from Fig. \ref{['fig:task']} is Injure.
  • Figure 3: Prompt-based pipeline for event detection (§\ref{['s:method_evt_detect']}) on the running example from Fig. \ref{['fig:task']}.
  • Figure 4: Prompt-based pipeline for argument extraction (§\ref{['s:method_arg_extract']}) on the running example from Fig. \ref{['fig:task']}.
  • Figure 5: Recall vs. compute cost for Injure (left) and Meet (right) in ACE for different temperatures.
  • ...and 2 more figures