Table of Contents
Fetching ...

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Yi Luan, Luheng He, Mari Ostendorf, Hannaneh Hajishirzi

TL;DR

The paper addresses the challenge of extracting rich scientific knowledge by jointly identifying entities, relations, and cross-sentence coreference. It introduces SciERC, a dataset with joint annotations, and SciIE, an end-to-end multi-task framework that shares span representations to improve extraction across tasks and sentences. Key contributions include superior performance on SciERC relative to strong baselines, enhanced knowledge graph construction from large corpora, and evidence that coreference propagation boosts graph quality. The work enables scalable construction and analysis of scientific knowledge graphs, with implications for information discovery and trend analysis in research domains.

Abstract

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

TL;DR

The paper addresses the challenge of extracting rich scientific knowledge by jointly identifying entities, relations, and cross-sentence coreference. It introduces SciERC, a dataset with joint annotations, and SciIE, an end-to-end multi-task framework that shares span representations to improve extraction across tasks and sentences. Key contributions include superior performance on SciERC relative to strong baselines, enhanced knowledge graph construction from large corpora, and evidence that coreference propagation boosts graph quality. The work enables scalable construction and analysis of scientific knowledge graphs, with implications for information discovery and trend analysis in research domains.

Abstract

We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.

Paper Structure

This paper contains 33 sections, 5 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Example annotation: phrases that refer to the same scientific concept are annotated into the same coreference cluster, such as MORphological PAser MORPA, it and MORPA (marked as red).
  • Figure 2: Overview of the multitask setup, where all three tasks are treated as classification problems on top of shared span representations. Dotted arcs indicate the normalization space for each task.
  • Figure 3: Knowledge graph construction process.
  • Figure 4: A part of an automatically constructed scientific knowledge graph with the most frequent neighbors of the scientific term statistical machine translation (SMT) on the graph. For simplicity we denote Used-for (Reverse) as Uses, Evaluated-for (Reverse) as Evaluated-by, and replace common terms with their acronyms. The original graph and more examples are given Figure \ref{['fig:SMT']} in Appendix \ref{['sec:appendix']}.
  • Figure 5: Frequency of detected entities with and without coreferece resolution: using coreference reduces the frequency of the generic phrase detection while significantly increasing the frequency of specific phrases. Linking entities through coreference helps disambiguate phrases when generating the knowledge graph.
  • ...and 5 more figures