Unsupervised Pretraining for Fact Verification by Language Model Distillation

Adrián Bazaga; Pietro Liò; Gos Micklem

Unsupervised Pretraining for Fact Verification by Language Model Distillation

Adrián Bazaga, Pietro Liò, Gos Micklem

TL;DR

SFAVEL (Self-supervised Fact Verification via Language Model Distillation via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations is proposed.

Abstract

Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.

Unsupervised Pretraining for Fact Verification by Language Model Distillation

TL;DR

Abstract

Paper Structure (29 sections, 8 equations, 4 figures, 5 tables)

This paper contains 29 sections, 8 equations, 4 figures, 5 tables.

Introduction
Related Work
Fact verification with pre-trained language models
Unsupervised pre-training methods for fact verification
Knowledge distillation
Overview of the approach
Data processing pipeline
Pretraining method
Generation of negative instances
In-batch negatives
In-knowledge-base negatives
Claim-Fact Matching via Language Model Distillation
Claim-Fact Distillation
Intra-Sample Contrastive Loss
Scoring Loss
...and 14 more sections

Figures (4)

Figure 1: (a) A high-level overview of the SFAVEL framework. Given a textual claim, we use a frozen language model (orange box) to obtain its embedding features, $X^{LM}$. The knowledge base is fed to the knowledge model to produce a knowledge base embedding $X^{F}$. Then, the scoring module produces scores for facts in the knowledge base, conditioned upon the claim embedding. The positive sub-graph formed by the top $K$ facts is kept, denoted as $X^{F^{+}}$. Next, a negative pool of instances $\mathcal{N}$. Finally, both the positive and negative sub-graphs are encoded with the knowledge model, obtaining the positive and negative sub-graph embeddings, $X^{F^{+}}$ and $X^{F^{-}}$, and their respective scores, $S^{+}$ and $S^{-}$. Grey boxes represent three the different components of our self-supervised loss function used to train the knowledge model. (b) Optional supervised fine-tuning stage on a downstream task using the pre-trained model.
Figure 2: Low-data experiments by fine-tuning on 1%, 5%, 10% and 15% of data over 3 different language backbones.
Figure 2: Accuracy of linear classification results on FEVER dev set using different pre-trained language models as backbone for distillation.
Figure 3: Label accuracy with different $K$ for number of facts selected after scoring.

Unsupervised Pretraining for Fact Verification by Language Model Distillation

TL;DR

Abstract

Unsupervised Pretraining for Fact Verification by Language Model Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)