Table of Contents
Fetching ...

EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains

Frank Mtumbuka, Steven Schockaert

TL;DR

This work addresses data sparsity and label noise in fine-grained entity typing by pre-training an entity encoder on coreference chains. It introduces a contrastive learning objective that pulls co-referring mentions together, leveraging predictions from two independent coreference systems to reduce noise. The pre-trained encoder, paired with a simple linear classifier, achieves state-of-the-art results on OntoNotes and FIGER, and strong performance on ACE 2005. The study highlights the importance of high-quality coreference signals and suggests extending the approach to cross-sentence context and ultra-fine typing in future work.

Abstract

Entity typing is the task of assigning semantic types to the entities that are mentioned in a text. In the case of fine-grained entity typing (FET), a large set of candidate type labels is considered. Since obtaining sufficient amounts of manual annotations is then prohibitively expensive, FET models are typically trained using distant supervision. In this paper, we propose to improve on this process by pre-training an entity encoder such that embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. The main problem with this strategy, which helps to explain why it has not previously been considered, is that predicted coreference links are often too noisy. We show that this problem can be addressed by using a simple trick: we only consider coreference links that are predicted by two different off-the-shelf systems. With this prudent use of coreference links, our pre-training strategy allows us to improve the state-of-the-art in benchmarks on fine-grained entity typing, as well as traditional entity extraction.

EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains

TL;DR

This work addresses data sparsity and label noise in fine-grained entity typing by pre-training an entity encoder on coreference chains. It introduces a contrastive learning objective that pulls co-referring mentions together, leveraging predictions from two independent coreference systems to reduce noise. The pre-trained encoder, paired with a simple linear classifier, achieves state-of-the-art results on OntoNotes and FIGER, and strong performance on ACE 2005. The study highlights the importance of high-quality coreference signals and suggests extending the approach to cross-sentence context and ultra-fine typing in future work.

Abstract

Entity typing is the task of assigning semantic types to the entities that are mentioned in a text. In the case of fine-grained entity typing (FET), a large set of candidate type labels is considered. Since obtaining sufficient amounts of manual annotations is then prohibitively expensive, FET models are typically trained using distant supervision. In this paper, we propose to improve on this process by pre-training an entity encoder such that embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. The main problem with this strategy, which helps to explain why it has not previously been considered, is that predicted coreference links are often too noisy. We show that this problem can be addressed by using a simple trick: we only consider coreference links that are predicted by two different off-the-shelf systems. With this prudent use of coreference links, our pre-training strategy allows us to improve the state-of-the-art in benchmarks on fine-grained entity typing, as well as traditional entity extraction.
Paper Structure (26 sections, 3 equations, 3 figures, 8 tables)

This paper contains 26 sections, 3 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Illustration of our proposed strategy. In the first step, an off-the-shelf coreference resolution method is used to identify coreference chains in stories. In the second step, we use contrastive learning to train an encoder which maps mentions from the same coreference chain to similar vectors. In the third step, we use standard training data to learn a linear classifier for each considered entity type.
  • Figure 2: Comparison of the percentage of correct predictions per gold label by the MLM-only and EnCore models (with roberta-large) on the OntoNotes test set. The instances of a label that are accurately predicted are expressed as a percentage of the total number of occurrences of the corresponding gold label.
  • Figure 3: Comparison of the percentage of correct predictions per gold label by the MLM-only and EnCore models (with roberta-large) on the FIGER test set. The instances of a label that are accurately predicted are expressed as a percentage of the total number of occurrences of the corresponding gold label.