An Improved Baseline for Sentence-level Relation Extraction
Wenxuan Zhou, Muhao Chen
TL;DR
The paper targets two bottlenecks in sentence-level relation extraction: entity representation and noisy labels. It introduces a typed entity marker to encode entity names and NER types into input, integrated into a RoBERTa-LARGE backbone, and demonstrates state-of-the-art performance across TACRED, TACREV, and Re-TACRED (notably 91.1% F1 on Re-TACRED). Through ablations and robustness analyses, it shows that explicit, comprehensive entity cues improve generalization to unseen entities and reduce the impact of label noise. The work provides extensive experiments and releases code to support ongoing research in RE with PLMs.
Abstract
Sentence-level relation extraction (RE) aims at identifying the relationship between two entities in a sentence. Many efforts have been devoted to this problem, while the best performing methods are still far from perfect. In this paper, we revisit two problems that affect the performance of existing RE models, namely entity representation and noisy or ill-defined labels. Our improved RE baseline, incorporated with entity representations with typed markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1% on the refined Re-TACRED dataset, demonstrating that the pretrained language models (PLMs) achieve high performance on this task. We release our code to the community for future research.
