CEO: Corpus-based Open-Domain Event Ontology Induction
Nan Xu, Hongming Zhang, Jianshu Chen
TL;DR
CEO tackles the problem of open-domain event understanding without relying on a fixed ontology by inducing a hierarchical, human-readable event ontology directly from corpora. It combines corpus-wide salient-event detection via distant supervision from summaries with WordNet-guided external knowledge, implemented through a reconstruction-plus-contrastive autoencoder to align event embeddings with the WordNet hierarchy. A key novelty is generating interpretable names for ontology nodes using in-context learning with GPT-J-6B, enabling easier curation and verification. Empirical results on ACE2005, MAVEN, RAMS, and 11 Allsides corpora show improved coverage and accuracy in ontology induction and high-quality, contextually appropriate type names, highlighting the approach’s potential for open-domain event understanding and downstream NLP tasks.
Abstract
Existing event-centric NLP models often only apply to the pre-defined ontology, which significantly restricts their generalization capabilities. This paper presents CEO, a novel Corpus-based Event Ontology induction model to relax the restriction imposed by pre-defined event ontologies. Without direct supervision, CEO leverages distant supervision from available summary datasets to detect corpus-wise salient events and exploits external event knowledge to force events within a short distance to have close embeddings. Experiments on three popular event datasets show that the schema induced by CEO has better coverage and higher accuracy than previous methods. Moreover, CEO is the first event ontology induction model that can induce a hierarchical event ontology with meaningful names on eleven open-domain corpora, making the induced schema more trustworthy and easier to be further curated.
