Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery
Jiaxin Wang, Lingling Zhang, Jun Liu, Tianlin Guo, Wenjun Wu
TL;DR
This work addresses open-world relation extraction by defining Generalized Relation Discovery (GRD), where unlabeled data may belong to pre-defined or novel relations, and where novel relations require explicit semantic descriptions. The authors introduce SFGRD, a two-stage framework that first generates semi-factuals via a tri-view debiased representation and then conducts semi-factual thinking through a dual-space tri-view learning system, consisting of a cluster-semantic space and a class-index space. Key contributions include formal GRD problem formulation, a novel two-stage learning paradigm with alignment and selection strategies, and extensive experiments showing improvements in both relation-label accuracy and semantic quality over strong baselines. The approach enables simultaneous label induction and semantic description for novel relations, advancing open-world RE with practical implications for knowledge graph construction and semantic discovery.
Abstract
We introduce a novel task, called Generalized Relation Discovery (GRD), for open-world relation extraction. GRD aims to identify unlabeled instances in existing pre-defined relations or discover novel relations by assigning instances to clusters as well as providing specific meanings for these clusters. The key challenges of GRD are how to mitigate the serious model biases caused by labeled pre-defined relations to learn effective relational representations and how to determine the specific semantics of novel relations during classifying or clustering unlabeled instances. We then propose a novel framework, SFGRD, for this task to solve the above issues by learning from semi-factuals in two stages. The first stage is semi-factual generation implemented by a tri-view debiased relation representation module, in which we take each original sentence as the main view and design two debiased views to generate semi-factual examples for this sentence. The second stage is semi-factual thinking executed by a dual-space tri-view collaborative relation learning module, where we design a cluster-semantic space and a class-index space to learn relational semantics and relation label indices, respectively. In addition, we devise alignment and selection strategies to integrate two spaces and establish a self-supervised learning loop for unlabeled data by doing semi-factual thinking across three views. Extensive experimental results show that SFGRD surpasses state-of-the-art models in terms of accuracy by 2.36\% $\sim$5.78\% and cosine similarity by 32.19\%$\sim$ 84.45\% for relation label index and relation semantic quality, respectively. To the best of our knowledge, we are the first to exploit the efficacy of semi-factuals in relation extraction.
