Augmenting Document-level Relation Extraction with Efficient Multi-Supervision
Xiangyu Lin, Weijia Jia, Zhiguo Gong
TL;DR
The paper addresses the inefficiency and noise inherent in leveraging distant supervision for document-level relation extraction. It introduces Efficient Multi-Supervision (EMS), which combines Document Informativeness Ranking (DIR) to selectively augment DS data with informative documents and Multi-Supervision Ranking-based Loss (MSRL) to robustly fuse distant, expert, and self supervision, mitigating label noise. On DocRED, EMS achieves competitive or superior F1 scores with dramatically reduced time costs compared to full-DS pretraining and other DS-based baselines, validated through ablations that demonstrate the necessity of DIR and MSRL. The approach offers a practical, scalable path to exploiting large DS datasets for DocRE in real-world settings, balancing accuracy and efficiency.
Abstract
Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilization of distantly supervised training data, we propose Efficient Multi-Supervision for document-level relation extraction, in which we first select a subset of informative documents from the massive dataset by combining distant supervision with expert supervision, then train the model with Multi-Supervision Ranking Loss that integrates the knowledge from multiple sources of supervision to alleviate the effects of noise. The experiments demonstrate the effectiveness of our method in improving the model performance with higher time efficiency than existing baselines.
