Table of Contents
Fetching ...

Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery

Jiaxin Wang, Lingling Zhang, Jun Liu, Tianlin Guo, Wenjun Wu

TL;DR

This work addresses open-world relation extraction by defining Generalized Relation Discovery (GRD), where unlabeled data may belong to pre-defined or novel relations, and where novel relations require explicit semantic descriptions. The authors introduce SFGRD, a two-stage framework that first generates semi-factuals via a tri-view debiased representation and then conducts semi-factual thinking through a dual-space tri-view learning system, consisting of a cluster-semantic space and a class-index space. Key contributions include formal GRD problem formulation, a novel two-stage learning paradigm with alignment and selection strategies, and extensive experiments showing improvements in both relation-label accuracy and semantic quality over strong baselines. The approach enables simultaneous label induction and semantic description for novel relations, advancing open-world RE with practical implications for knowledge graph construction and semantic discovery.

Abstract

We introduce a novel task, called Generalized Relation Discovery (GRD), for open-world relation extraction. GRD aims to identify unlabeled instances in existing pre-defined relations or discover novel relations by assigning instances to clusters as well as providing specific meanings for these clusters. The key challenges of GRD are how to mitigate the serious model biases caused by labeled pre-defined relations to learn effective relational representations and how to determine the specific semantics of novel relations during classifying or clustering unlabeled instances. We then propose a novel framework, SFGRD, for this task to solve the above issues by learning from semi-factuals in two stages. The first stage is semi-factual generation implemented by a tri-view debiased relation representation module, in which we take each original sentence as the main view and design two debiased views to generate semi-factual examples for this sentence. The second stage is semi-factual thinking executed by a dual-space tri-view collaborative relation learning module, where we design a cluster-semantic space and a class-index space to learn relational semantics and relation label indices, respectively. In addition, we devise alignment and selection strategies to integrate two spaces and establish a self-supervised learning loop for unlabeled data by doing semi-factual thinking across three views. Extensive experimental results show that SFGRD surpasses state-of-the-art models in terms of accuracy by 2.36\% $\sim$5.78\% and cosine similarity by 32.19\%$\sim$ 84.45\% for relation label index and relation semantic quality, respectively. To the best of our knowledge, we are the first to exploit the efficacy of semi-factuals in relation extraction.

Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery

TL;DR

This work addresses open-world relation extraction by defining Generalized Relation Discovery (GRD), where unlabeled data may belong to pre-defined or novel relations, and where novel relations require explicit semantic descriptions. The authors introduce SFGRD, a two-stage framework that first generates semi-factuals via a tri-view debiased representation and then conducts semi-factual thinking through a dual-space tri-view learning system, consisting of a cluster-semantic space and a class-index space. Key contributions include formal GRD problem formulation, a novel two-stage learning paradigm with alignment and selection strategies, and extensive experiments showing improvements in both relation-label accuracy and semantic quality over strong baselines. The approach enables simultaneous label induction and semantic description for novel relations, advancing open-world RE with practical implications for knowledge graph construction and semantic discovery.

Abstract

We introduce a novel task, called Generalized Relation Discovery (GRD), for open-world relation extraction. GRD aims to identify unlabeled instances in existing pre-defined relations or discover novel relations by assigning instances to clusters as well as providing specific meanings for these clusters. The key challenges of GRD are how to mitigate the serious model biases caused by labeled pre-defined relations to learn effective relational representations and how to determine the specific semantics of novel relations during classifying or clustering unlabeled instances. We then propose a novel framework, SFGRD, for this task to solve the above issues by learning from semi-factuals in two stages. The first stage is semi-factual generation implemented by a tri-view debiased relation representation module, in which we take each original sentence as the main view and design two debiased views to generate semi-factual examples for this sentence. The second stage is semi-factual thinking executed by a dual-space tri-view collaborative relation learning module, where we design a cluster-semantic space and a class-index space to learn relational semantics and relation label indices, respectively. In addition, we devise alignment and selection strategies to integrate two spaces and establish a self-supervised learning loop for unlabeled data by doing semi-factual thinking across three views. Extensive experimental results show that SFGRD surpasses state-of-the-art models in terms of accuracy by 2.36\% 5.78\% and cosine similarity by 32.19\% 84.45\% for relation label index and relation semantic quality, respectively. To the best of our knowledge, we are the first to exploit the efficacy of semi-factuals in relation extraction.
Paper Structure (21 sections, 10 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 10 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustrative examples of entity bias and context bias as well as semi-factual instances. (A) Training on labeled pre-defined relations leads to inherent model bias; (B) Models may misidentify relations in unlabeled data due to entity and context biases; (C) Generate semi-factual instances through different debiased views.
  • Figure 2: Overview of SFGRD. The input consists of labeled and unlabeled data, with labeled data only containing pre-defined relations, and unlabeled data including both pre-defined and novel relations. After two-stage learning, SFGRD outputs label indices and relational words for unlabeled data.
  • Figure 3: Comparison of different models accuracy against different initiation numbers of clusters on FewRel dataset. The novel relation ratio is 20%.
  • Figure 4: Normalized label index accuracy and semantic cosine similarity of different novel relations on FewRel. The dashed lines represent the median values.