Table of Contents
Fetching ...

FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

Gautam Kishore Shahi, Amit Kumar Jaiswal, Thomas Mandl

TL;DR

The paper tackles misinformation surrounding the 2023 Israel-Hamas war by constructing FakeClaim, the first public multilingual dataset of factual claims drawn from 60 fact-checking organizations in 30 languages, linked to YouTube videos and user engagement signals. It uses the AMUSED framework to collect data and formulates a multimodal YouTube fake-news detection task leveraging video text, comments, and background evidence. The strongest result comes from fine-tuning the Universal Sentence Encoder, achieving a Macro F1 of $0.87$ for fake content, with additional gains when including claims and evidence; the study also provides baseline comparisons and ethical considerations. The dataset and models are publicly available, enabling future research to improve debunking of fake videos and advance cross-platform misinformation analysis in conflict contexts.

Abstract

We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{https://github.com/Gautamshahi/FakeClaim}

FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

TL;DR

The paper tackles misinformation surrounding the 2023 Israel-Hamas war by constructing FakeClaim, the first public multilingual dataset of factual claims drawn from 60 fact-checking organizations in 30 languages, linked to YouTube videos and user engagement signals. It uses the AMUSED framework to collect data and formulates a multimodal YouTube fake-news detection task leveraging video text, comments, and background evidence. The strongest result comes from fine-tuning the Universal Sentence Encoder, achieving a Macro F1 of for fake content, with additional gains when including claims and evidence; the study also provides baseline comparisons and ethical considerations. The dataset and models are publicly available, enabling future research to improve debunking of fake videos and advance cross-platform misinformation analysis in conflict contexts.

Abstract

We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{https://github.com/Gautamshahi/FakeClaim}
Paper Structure (9 sections, 1 equation, 1 figure, 3 tables)

This paper contains 9 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The research framework used for extracting the dataset and classification problem.