Table of Contents
Fetching ...

VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

Arjun Shah, Hetansh Shah, Vedica Bafna, Charmi Khandor, Sindhu Nair

TL;DR

This work proposes a novel solution, leveraging web-scraping techniques and Natural Language Inference models to retrieve external knowledge necessary for verifying the accuracy of a headline, and achieves an accuracy surpassing the best classical Machine Learning model and Bidirectional Encoder Representations from Transformers.

Abstract

In today's day and age where information is rapidly spread through online platforms, the rise of fake news poses an alarming threat to the integrity of public discourse, societal trust, and reputed news sources. Classical machine learning and Transformer-based models have been extensively studied for the task of fake news detection, however they are hampered by their reliance on training data and are unable to generalize on unseen headlines. To address these challenges, we propose our novel solution, leveraging web-scraping techniques and Natural Language Inference (NLI) models to retrieve external knowledge necessary for verifying the accuracy of a headline. Our system is evaluated on a diverse self-curated evaluation dataset spanning over multiple news channels and broad domains. Our best performing pipeline achieves an accuracy of 84.3% surpassing the best classical Machine Learning model by 33.3% and Bidirectional Encoder Representations from Transformers (BERT) by 31.0% . This highlights the efficacy of combining dynamic web-scraping with Natural Language Inference to find support for a claimed headline in the corresponding externally retrieved knowledge for the task of fake news detection.

VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

TL;DR

This work proposes a novel solution, leveraging web-scraping techniques and Natural Language Inference models to retrieve external knowledge necessary for verifying the accuracy of a headline, and achieves an accuracy surpassing the best classical Machine Learning model and Bidirectional Encoder Representations from Transformers.

Abstract

In today's day and age where information is rapidly spread through online platforms, the rise of fake news poses an alarming threat to the integrity of public discourse, societal trust, and reputed news sources. Classical machine learning and Transformer-based models have been extensively studied for the task of fake news detection, however they are hampered by their reliance on training data and are unable to generalize on unseen headlines. To address these challenges, we propose our novel solution, leveraging web-scraping techniques and Natural Language Inference (NLI) models to retrieve external knowledge necessary for verifying the accuracy of a headline. Our system is evaluated on a diverse self-curated evaluation dataset spanning over multiple news channels and broad domains. Our best performing pipeline achieves an accuracy of 84.3% surpassing the best classical Machine Learning model by 33.3% and Bidirectional Encoder Representations from Transformers (BERT) by 31.0% . This highlights the efficacy of combining dynamic web-scraping with Natural Language Inference to find support for a claimed headline in the corresponding externally retrieved knowledge for the task of fake news detection.

Paper Structure

This paper contains 24 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The figure presents the label distribution of the LIAR dataset. Labels categorize claims as "True", "Mostly True", "Half True", "Barely True", "False", and "Pants on Fire". The label denotes the credibility of each headline in the dataset.
  • Figure 2: Example of an Unreliable Headline generated using Microsoft's Phi-3, when prompted with our zero-shot prompting approach.
  • Figure 3: Fine-Tuning of BERT using transfer learning utilizing the LIAR dataset to ascertain the veracity of claimed headlines.
  • Figure 4: Example of a question generated by Mistral 7B to aid in the knowledge retrieval, a crucial step of the Small Language Model Pipeline.
  • Figure 5: Architecture Diagram of our proposed solution VERITAS-NLI, detailing the workflow of our 3 proposed pipelines.
  • ...and 2 more figures