Table of Contents
Fetching ...

A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling

Ye Wang, Huazheng Pan, Tao Zhang, Wen Wu, Wenxin Hu

TL;DR

The paper tackles document-level relation extraction under incomplete labeling, where many true relations are unlabeled. It introduces P3M, a positive-unlabeled metric learning framework that embeds relations (including a none-class) and uses SoftMax_norm loss, prior shift, and a positive-unlabeled objective to align positive entity-pairs with their relation embeddings while separating them from the none-class. To improve generalization and mitigate labeling bias, it employs dropout-based positive augmentation (P2M) and a positive-none-class mixup (P3M) that interpolates embeddings with the none-class relation as a pseudo-negative. Empirically, P3M achieves 4–11 F1-point gains on DocRED under incomplete labeling and state-of-the-art results in fully labeled settings, while also showing robustness to prior-estimation bias on both DocRED and ChemDisGene. The framework demonstrates strong practical potential for real-world document-level RE with incomplete annotations.

Abstract

The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document-level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-unlabeled metric learning framework (P3M). Specifically, we formulate document-level RE as a metric learning problem. We aim to pull the distance closer between entity pair embedding and their corresponding relation embedding, while pushing it farther away from the none-class relation embedding. Additionally, we adapt the positive-unlabeled learning to this loss objective. In order to improve the generalizability of the model, we use dropout to augment positive samples and propose a positive-none-class mixup method. Extensive experiments show that P3M improves the F1 score by approximately 4-10 points in document-level RE with incomplete labeling, and achieves state-of-the-art results in fully labeled scenarios. Furthermore, P3M has also demonstrated robustness to prior estimation bias in incomplete labeled scenarios.

A Positive-Unlabeled Metric Learning Framework for Document-Level Relation Extraction with Incomplete Labeling

TL;DR

The paper tackles document-level relation extraction under incomplete labeling, where many true relations are unlabeled. It introduces P3M, a positive-unlabeled metric learning framework that embeds relations (including a none-class) and uses SoftMax_norm loss, prior shift, and a positive-unlabeled objective to align positive entity-pairs with their relation embeddings while separating them from the none-class. To improve generalization and mitigate labeling bias, it employs dropout-based positive augmentation (P2M) and a positive-none-class mixup (P3M) that interpolates embeddings with the none-class relation as a pseudo-negative. Empirically, P3M achieves 4–11 F1-point gains on DocRED under incomplete labeling and state-of-the-art results in fully labeled settings, while also showing robustness to prior-estimation bias on both DocRED and ChemDisGene. The framework demonstrates strong practical potential for real-world document-level RE with incomplete annotations.

Abstract

The goal of document-level relation extraction (RE) is to identify relations between entities that span multiple sentences. Recently, incomplete labeling in document-level RE has received increasing attention, and some studies have used methods such as positive-unlabeled learning to tackle this issue, but there is still a lot of room for improvement. Motivated by this, we propose a positive-augmentation and positive-mixup positive-unlabeled metric learning framework (P3M). Specifically, we formulate document-level RE as a metric learning problem. We aim to pull the distance closer between entity pair embedding and their corresponding relation embedding, while pushing it farther away from the none-class relation embedding. Additionally, we adapt the positive-unlabeled learning to this loss objective. In order to improve the generalizability of the model, we use dropout to augment positive samples and propose a positive-none-class mixup method. Extensive experiments show that P3M improves the F1 score by approximately 4-10 points in document-level RE with incomplete labeling, and achieves state-of-the-art results in fully labeled scenarios. Furthermore, P3M has also demonstrated robustness to prior estimation bias in incomplete labeled scenarios.
Paper Structure (25 sections, 12 equations, 2 figures, 5 tables)

This paper contains 25 sections, 12 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: In the dense representation space for a specific positive relation, the P$^{3}$M framework brings the positive sample (orange circle) and its augmented embedding (light orange circle) closer to the positive relation embedding (yellow pentagram), while distancing them from the none-class relation embedding (grey triangle). The unlabeled sample (light grey circle) is distanced from the positive relation and brought closer to the none-class relation. To address scarcity of positive samples, extra positive sample embeddings (light green circles) are obtained using mixup, partially aligning them with the positive relation and distancing them from the none-class relation.
  • Figure 2: Effect of hyperparameters on DocRED