Table of Contents
Fetching ...

Textual Entailment for Effective Triple Validation in Object Prediction

Andrés García-Silva, Cristian Berrío, José Manuel Gómez-Pérez

TL;DR

This work tackles open-world object prediction for knowledge graphs by leveraging textual entailment to validate candidate triples generated from language models and other sources. The proposed SATORI framework retrieves web premises, generates candidate objects from LMs, KGs, and NER, and uses an entailment model to confirm whether the premises entail the candidate triple, guided by per-relations thresholds. Empirical results show substantial gains in precision and overall F1 across training regimes, with the strongest performance achieved when combining language models with existing knowledge bases and NER, especially under low-data conditions. The findings indicate that entailment-based validation effectively filters non-relevant candidates and that a hybrid approach leveraging multiple information sources yields the best object prediction performance in open-world KBP settings.

Abstract

Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines. However, prompt-based fact retrieval can be brittle and heavily depend on the prompts and context used, which may produce results that are unintended or hallucinatory.We propose to use textual entailment to validate facts extracted from language models through cloze statements. Our results show that triple validation based on textual entailment improves language model predictions in different training regimes. Furthermore, we show that entailment-based triple validation is also effective to validate candidate facts extracted from other sources including existing knowledge graphs and text passages where named entities are recognized.

Textual Entailment for Effective Triple Validation in Object Prediction

TL;DR

This work tackles open-world object prediction for knowledge graphs by leveraging textual entailment to validate candidate triples generated from language models and other sources. The proposed SATORI framework retrieves web premises, generates candidate objects from LMs, KGs, and NER, and uses an entailment model to confirm whether the premises entail the candidate triple, guided by per-relations thresholds. Empirical results show substantial gains in precision and overall F1 across training regimes, with the strongest performance achieved when combining language models with existing knowledge bases and NER, especially under low-data conditions. The findings indicate that entailment-based validation effectively filters non-relevant candidates and that a hybrid approach leveraging multiple information sources yields the best object prediction performance in open-world KBP settings.

Abstract

Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines. However, prompt-based fact retrieval can be brittle and heavily depend on the prompts and context used, which may produce results that are unintended or hallucinatory.We propose to use textual entailment to validate facts extracted from language models through cloze statements. Our results show that triple validation based on textual entailment improves language model predictions in different training regimes. Furthermore, we show that entailment-based triple validation is also effective to validate candidate facts extracted from other sources including existing knowledge graphs and text passages where named entities are recognized.
Paper Structure (16 sections, 2 figures, 1 table)

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: SATORI architecture exemplified using as input pair John Lennon in the subject and PersonInstrument in the relation.
  • Figure 2: Object prediction evaluation on the LM_KBC22 dataset.