Table of Contents
Fetching ...

Large Language Models and Provenance Metadata for Determining the Relevance of Images and Videos in News Stories

Tomas Peterka, Matyas Bohacek

TL;DR

The paper addresses the challenge of detecting misinformation that leverages multimodal media by leveraging provenance metadata within a large-language-model (LLM) framework. It proposes a framework that ingests a news article, attached media captions, and provenance metadata to assess whether the media is relevant to the story and whether it has been tampered with, outputting location relevance, tampering status, and an overall relevance verdict. A concrete prototype is implemented using Newspaper4k for article scraping, the C2PA standard for provenance, and the Phi-3 LLM with a Gradio-based web interface, highlighting its open-source MIT-licensed release. The work acknowledges limitations such as LLM hallucinations, sparse provenance adoption in practice, lack of dedicated datasets for evaluation, and potential biases, and outlines concrete directions for future evaluation and dataset creation.

Abstract

The most effective misinformation campaigns are multimodal, often combining text with images and videos taken out of context -- or fabricating them entirely -- to support a given narrative. Contemporary methods for detecting misinformation, whether in deepfakes or text articles, often miss the interplay between multiple modalities. Built around a large language model, the system proposed in this paper addresses these challenges. It analyzes both the article's text and the provenance metadata of included images and videos to determine whether they are relevant. We open-source the system prototype and interactive web interface.

Large Language Models and Provenance Metadata for Determining the Relevance of Images and Videos in News Stories

TL;DR

The paper addresses the challenge of detecting misinformation that leverages multimodal media by leveraging provenance metadata within a large-language-model (LLM) framework. It proposes a framework that ingests a news article, attached media captions, and provenance metadata to assess whether the media is relevant to the story and whether it has been tampered with, outputting location relevance, tampering status, and an overall relevance verdict. A concrete prototype is implemented using Newspaper4k for article scraping, the C2PA standard for provenance, and the Phi-3 LLM with a Gradio-based web interface, highlighting its open-source MIT-licensed release. The work acknowledges limitations such as LLM hallucinations, sparse provenance adoption in practice, lack of dedicated datasets for evaluation, and potential biases, and outlines concrete directions for future evaluation and dataset creation.

Abstract

The most effective misinformation campaigns are multimodal, often combining text with images and videos taken out of context -- or fabricating them entirely -- to support a given narrative. Contemporary methods for detecting misinformation, whether in deepfakes or text articles, often miss the interplay between multiple modalities. Built around a large language model, the system proposed in this paper addresses these challenges. It analyzes both the article's text and the provenance metadata of included images and videos to determine whether they are relevant. We open-source the system prototype and interactive web interface.

Paper Structure

This paper contains 11 sections, 2 figures.

Figures (2)

  • Figure 1: Screenshots of the (a) full article and (b) URL (simplified) input interface in the prototype web interface.
  • Figure 2: Screenshots of the prototype web interface displaying a result in which the media were found (a) relevant and (b) not relevant to the news story. The chat interface allows for submitting follow-up questions to the LLM.