Table of Contents
Fetching ...

CEO: Corpus-based Open-Domain Event Ontology Induction

Nan Xu, Hongming Zhang, Jianshu Chen

TL;DR

CEO tackles the problem of open-domain event understanding without relying on a fixed ontology by inducing a hierarchical, human-readable event ontology directly from corpora. It combines corpus-wide salient-event detection via distant supervision from summaries with WordNet-guided external knowledge, implemented through a reconstruction-plus-contrastive autoencoder to align event embeddings with the WordNet hierarchy. A key novelty is generating interpretable names for ontology nodes using in-context learning with GPT-J-6B, enabling easier curation and verification. Empirical results on ACE2005, MAVEN, RAMS, and 11 Allsides corpora show improved coverage and accuracy in ontology induction and high-quality, contextually appropriate type names, highlighting the approach’s potential for open-domain event understanding and downstream NLP tasks.

Abstract

Existing event-centric NLP models often only apply to the pre-defined ontology, which significantly restricts their generalization capabilities. This paper presents CEO, a novel Corpus-based Event Ontology induction model to relax the restriction imposed by pre-defined event ontologies. Without direct supervision, CEO leverages distant supervision from available summary datasets to detect corpus-wise salient events and exploits external event knowledge to force events within a short distance to have close embeddings. Experiments on three popular event datasets show that the schema induced by CEO has better coverage and higher accuracy than previous methods. Moreover, CEO is the first event ontology induction model that can induce a hierarchical event ontology with meaningful names on eleven open-domain corpora, making the induced schema more trustworthy and easier to be further curated.

CEO: Corpus-based Open-Domain Event Ontology Induction

TL;DR

CEO tackles the problem of open-domain event understanding without relying on a fixed ontology by inducing a hierarchical, human-readable event ontology directly from corpora. It combines corpus-wide salient-event detection via distant supervision from summaries with WordNet-guided external knowledge, implemented through a reconstruction-plus-contrastive autoencoder to align event embeddings with the WordNet hierarchy. A key novelty is generating interpretable names for ontology nodes using in-context learning with GPT-J-6B, enabling easier curation and verification. Empirical results on ACE2005, MAVEN, RAMS, and 11 Allsides corpora show improved coverage and accuracy in ontology induction and high-quality, contextually appropriate type names, highlighting the approach’s potential for open-domain event understanding and downstream NLP tasks.

Abstract

Existing event-centric NLP models often only apply to the pre-defined ontology, which significantly restricts their generalization capabilities. This paper presents CEO, a novel Corpus-based Event Ontology induction model to relax the restriction imposed by pre-defined event ontologies. Without direct supervision, CEO leverages distant supervision from available summary datasets to detect corpus-wise salient events and exploits external event knowledge to force events within a short distance to have close embeddings. Experiments on three popular event datasets show that the schema induced by CEO has better coverage and higher accuracy than previous methods. Moreover, CEO is the first event ontology induction model that can induce a hierarchical event ontology with meaningful names on eleven open-domain corpora, making the induced schema more trustworthy and easier to be further curated.
Paper Structure (35 sections, 1 equation, 8 figures, 15 tables)

This paper contains 35 sections, 1 equation, 8 figures, 15 tables.

Figures (8)

  • Figure 1: Instances from Covid-19 corpus with event type induced by previous work and ontology induced by CEO. The non-salient event treatmentin S4 is disregarded while others are preserved. Event type induction only identifies events triggered by verbs (S1, S2, S3) but not nouns (S4), and arranges events into simple clusters. CEO recognizes both verb- and noun-triggered events, induces tree-structure ontology and provides concrete names.
  • Figure 2: Framework of the proposed CEO. Step 1: extract events triggered by nouns or verbs; Step 2: preserve salient events with distant supervision from summaries; Step 3: improve event representations for hierarchical clustering with external event knowledge from WordNet; Step 4: generate event type names with in-context learning.
  • Figure 3: Event ontology induced by ward linkage on ACE2005. Each leaf node represents one event mention and is colored by its actual coarsest event type: Life, Personnel, Justice, Conflict, Transaction, Movement, Contact, Business. The ontology hierarchies of the other two datasets are visualized in \ref{['fig:datasets_full']}.
  • Figure 4: Impact of different utilization methods of external WordNet knowledge on hierarchical clustering (purity by linage ward). When both reconstruction and contrastive loss are employed, we also show the influence of the distance threshold. Dasgupta costs are omitted for statistically insignificant value variances.
  • Figure 5: The proposed autoencoder model to improve event embeddings by leveraging external knowledge. The typical autoencoder architecture is optimized with the weighted sum of reconstruction loss and contrastive triplet margin loss (left). The event mention triplet in the form of <anchor, positive, negative> is selected based on the $d$-distance, which is calculated according to the pre-defined ontology of WordNet (right).
  • ...and 3 more figures