GLiDRE: Generalist Lightweight model for Document-level Relation Extraction
Robin Armingaud, Romaric Besançon
TL;DR
GLiDRE addresses document-level relation extraction under data scarcity by employing a compact bi-encoder that separately encodes documents and relation labels. It builds refined relation representations from pooled entity mentions and optionally enriches them with localized context pooling, trained with focal loss and supported by synthetic pretraining data generated from a large LLM. The model achieves state-of-the-art few-shot performance on Re-DocRED-derived benchmarks and competitive fully supervised results, while delivering substantial efficiency advantages over large LLMs in zero-shot scenarios. This approach demonstrates that smaller, specialized encoders can rival much larger models for complex document-level IE tasks, enabling practical deployment with limited compute and labeled data.
Abstract
Relation Extraction (RE) is a fundamental task in Natural Language Processing, and its document-level variant poses significant challenges, due to complex interactions between entities across sentences. While supervised models have achieved strong results in fully resourced settings, their behavior with limited training data remains insufficiently studied. We introduce GLiDRE, a new compact model for document-level relation extraction, designed to work efficiently in both supervised and few-shot settings. Experiments in both low-resource supervised training and few-shot meta-learning benchmarks show that our approach outperforms existing methods in data-constrained scenarios, establishing a new state-of-the-art in few-shot document-level relation extraction. Our code will be publicly available.
