Table of Contents
Fetching ...

TruthLens:A Training-Free Paradigm for DeepFake Detection

Ritabrata Chakraborty, Rajatsubhra Chakraborty, Ali Khaleghi Rahimian, Thomas MacDougall

TL;DR

Deepfake detection traditionally relies on opaque binary classifiers with limited explanations. TruthLens reframes detection as a training-free visual question-answering task that leverages LVLMs to observe artifacts and LLMs to reason and justify decisions, producing human-readable explanations. The method employs a four-step pipeline—Question Generation, Multimodal Reasoning, Textual Aggregation, and Final Decision Making—and demonstrates superior AUC and accuracy on LDM and ProGAN datasets while maintaining interpretability. This training-free paradigm offers scalable, adaptable detection for evolving synthetic media, promoting trust through transparent, evidence-based justifications.

Abstract

The proliferation of synthetic images generated by advanced AI models poses significant challenges in identifying and understanding manipulated visual content. Current fake image detection methods predominantly rely on binary classification models that focus on accuracy while often neglecting interpretability, leaving users without clear insights into why an image is deemed real or fake. To bridge this gap, we introduce TruthLens, a novel training-free framework that reimagines deepfake detection as a visual question-answering (VQA) task. TruthLens utilizes state-of-the-art large vision-language models (LVLMs) to observe and describe visual artifacts and combines this with the reasoning capabilities of large language models (LLMs) like GPT-4 to analyze and aggregate evidence into informed decisions. By adopting a multimodal approach, TruthLens seamlessly integrates visual and semantic reasoning to not only classify images as real or fake but also provide interpretable explanations for its decisions. This transparency enhances trust and provides valuable insights into the artifacts that signal synthetic content. Extensive evaluations demonstrate that TruthLens outperforms conventional methods, achieving high accuracy on challenging datasets while maintaining a strong emphasis on explainability. By reframing deepfake detection as a reasoning-driven process, TruthLens establishes a new paradigm in combating synthetic media, combining cutting-edge performance with interpretability to address the growing threats of visual disinformation.

TruthLens:A Training-Free Paradigm for DeepFake Detection

TL;DR

Deepfake detection traditionally relies on opaque binary classifiers with limited explanations. TruthLens reframes detection as a training-free visual question-answering task that leverages LVLMs to observe artifacts and LLMs to reason and justify decisions, producing human-readable explanations. The method employs a four-step pipeline—Question Generation, Multimodal Reasoning, Textual Aggregation, and Final Decision Making—and demonstrates superior AUC and accuracy on LDM and ProGAN datasets while maintaining interpretability. This training-free paradigm offers scalable, adaptable detection for evolving synthetic media, promoting trust through transparent, evidence-based justifications.

Abstract

The proliferation of synthetic images generated by advanced AI models poses significant challenges in identifying and understanding manipulated visual content. Current fake image detection methods predominantly rely on binary classification models that focus on accuracy while often neglecting interpretability, leaving users without clear insights into why an image is deemed real or fake. To bridge this gap, we introduce TruthLens, a novel training-free framework that reimagines deepfake detection as a visual question-answering (VQA) task. TruthLens utilizes state-of-the-art large vision-language models (LVLMs) to observe and describe visual artifacts and combines this with the reasoning capabilities of large language models (LLMs) like GPT-4 to analyze and aggregate evidence into informed decisions. By adopting a multimodal approach, TruthLens seamlessly integrates visual and semantic reasoning to not only classify images as real or fake but also provide interpretable explanations for its decisions. This transparency enhances trust and provides valuable insights into the artifacts that signal synthetic content. Extensive evaluations demonstrate that TruthLens outperforms conventional methods, achieving high accuracy on challenging datasets while maintaining a strong emphasis on explainability. By reframing deepfake detection as a reasoning-driven process, TruthLens establishes a new paradigm in combating synthetic media, combining cutting-edge performance with interpretability to address the growing threats of visual disinformation.

Paper Structure

This paper contains 25 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the detection pipeline used in the TruthLens framework.
  • Figure 2: Overview of the evaluation dataset on the left hand side we have Real images from FFHQ dataset ffhq and on the right we have ProGAN generated images from ForgeryNet dataset forgerynet and Latent Diffusion Model(LDM) LDM generated images.
  • Figure 3: A visualization of the yes/no prompts given to LVLMs, and their responses.
  • Figure 4: A visualization of the output of the model for each of the prompts, and the LVLM's final verdict on whether each sample is real or fake.