Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction
Hao Zhang, Yang Liu, Xiaoyan Liu, Tianming Liang, Gaurav Sharma, Liang Xue, Maozu Guo
TL;DR
This paper addresses the challenge of noisy labels in distantly supervised biomedical relation extraction by introducing GBRE, a graph-based framework that treats a bag of sentences as a fully connected graph and enables inter-sentence message passing. It combines query-sentence attention to suppress sentence noise with a graph-based intra-bag attention to capture inter-sentence dependencies, followed by selective bag-level attention for final relation classification, trained via cross-entropy. Empirical results on BioRel, TBGA, and NYT-10 show GBRE achieves state-of-the-art performance across biomedical and general DSRE benchmarks, with ablations confirming the effectiveness of both QS_ATT and BAG_ATT. The work provides a universal, external-knowledge-free approach that improves noise robustness and inter-sentence reasoning, with potential for integration with external knowledge and BioBERT-based pipelines in future work.
Abstract
We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain.
