Table of Contents
Fetching ...

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

Hao Zhang, Yang Liu, Xiaoyan Liu, Tianming Liang, Gaurav Sharma, Liang Xue, Maozu Guo

TL;DR

This paper addresses the challenge of noisy labels in distantly supervised biomedical relation extraction by introducing GBRE, a graph-based framework that treats a bag of sentences as a fully connected graph and enables inter-sentence message passing. It combines query-sentence attention to suppress sentence noise with a graph-based intra-bag attention to capture inter-sentence dependencies, followed by selective bag-level attention for final relation classification, trained via cross-entropy. Empirical results on BioRel, TBGA, and NYT-10 show GBRE achieves state-of-the-art performance across biomedical and general DSRE benchmarks, with ablations confirming the effectiveness of both QS_ATT and BAG_ATT. The work provides a universal, external-knowledge-free approach that improves noise robustness and inter-sentence reasoning, with potential for integration with external knowledge and BioBERT-based pipelines in future work.

Abstract

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain.

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

TL;DR

This paper addresses the challenge of noisy labels in distantly supervised biomedical relation extraction by introducing GBRE, a graph-based framework that treats a bag of sentences as a fully connected graph and enables inter-sentence message passing. It combines query-sentence attention to suppress sentence noise with a graph-based intra-bag attention to capture inter-sentence dependencies, followed by selective bag-level attention for final relation classification, trained via cross-entropy. Empirical results on BioRel, TBGA, and NYT-10 show GBRE achieves state-of-the-art performance across biomedical and general DSRE benchmarks, with ablations confirming the effectiveness of both QS_ATT and BAG_ATT. The work provides a universal, external-knowledge-free approach that improves noise robustness and inter-sentence reasoning, with potential for integration with external knowledge and BioBERT-based pipelines in future work.

Abstract

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain.
Paper Structure (17 sections, 20 equations, 7 figures, 13 tables)

This paper contains 17 sections, 20 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: The proposed graph-based relation extraction framework (GBRE). Query-sentence attention is adopted to couple a query vector and sentence vector and produce a set of query-aware feature vectors for the sentence. A sentence encoder is used to obtain the sentence representations. The bag self-attention layer aims to extract the relevance between sentences within a bag and utilize the inter-sentence level information of a sentence bag by viewing the sentence bag as graph. A selective attention layer is used to obtain the sentence bag representation by performing a weighted sum on the representations of sentences. A final classifier predicts relations mentioned in the sentence bag.
  • Figure 2: Sentence to query ($sq$) attention score computation.
  • Figure 3: Query to sentence ($qs$) attention score computation.
  • Figure 4: Sentence bag graph structure. Each node $s_{i}$ denotes the corresponding sentence and the sentence bag is viewed as graph.
  • Figure 5: PR curves over the BioRel and TBGA datasets for the proposed GBRE model and for several prior methods. The proposed GBRE model exhibits the best performance on both datasets. Note that GBRE-BERT indicates BERT-based GBRE variant, the proposed GBRE model using BERT as encoder layer. (a) non-BERT models on BioRel dataset. (b) BERT-based models on BioRel dataset. (c) non-BERT models on TBGA dataset. (d)BERT-based models on TBGA dataset.
  • ...and 2 more figures