Table of Contents
Fetching ...

Unsupervised Pretraining for Fact Verification by Language Model Distillation

Adrián Bazaga, Pietro Liò, Gos Micklem

TL;DR

SFAVEL (Self-supervised Fact Verification via Language Model Distillation via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations is proposed.

Abstract

Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.

Unsupervised Pretraining for Fact Verification by Language Model Distillation

TL;DR

SFAVEL (Self-supervised Fact Verification via Language Model Distillation via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations is proposed.

Abstract

Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.
Paper Structure (29 sections, 8 equations, 4 figures, 5 tables)

This paper contains 29 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: (a) A high-level overview of the SFAVEL framework. Given a textual claim, we use a frozen language model (orange box) to obtain its embedding features, $X^{LM}$. The knowledge base is fed to the knowledge model to produce a knowledge base embedding $X^{F}$. Then, the scoring module produces scores for facts in the knowledge base, conditioned upon the claim embedding. The positive sub-graph formed by the top $K$ facts is kept, denoted as $X^{F^{+}}$. Next, a negative pool of instances $\mathcal{N}$. Finally, both the positive and negative sub-graphs are encoded with the knowledge model, obtaining the positive and negative sub-graph embeddings, $X^{F^{+}}$ and $X^{F^{-}}$, and their respective scores, $S^{+}$ and $S^{-}$. Grey boxes represent three the different components of our self-supervised loss function used to train the knowledge model. (b) Optional supervised fine-tuning stage on a downstream task using the pre-trained model.
  • Figure 2: Low-data experiments by fine-tuning on 1%, 5%, 10% and 15% of data over 3 different language backbones.
  • Figure 2: Accuracy of linear classification results on FEVER dev set using different pre-trained language models as backbone for distillation.
  • Figure 3: Label accuracy with different $K$ for number of facts selected after scoring.