Table of Contents
Fetching ...

How Effective Are Publicly Accessible Deepfake Detection Tools? A Comparative Evaluation of Open-Source and Free-to-Use Platforms

Michael Rettinger, Ben Beaumont, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le

TL;DR

This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools and AI-based classifiers, and reports three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern.

Abstract

The proliferation of deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity. While detection algorithm research is abundant, empirical evaluations of publicly accessible tools that practitioners actually use remain scarce. This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools (InVID \& WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). Both tool categories were evaluated by professional investigators with law enforcement experience using blinded protocols across datasets comprising authentic, tampered, and AI-generated images sourced from DF40, CelebDF, and CASIA-v2. We report three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern; human evaluators substantially outperform all automated tools; and human-AI disagreement is asymmetric, with human judgment prevailing in the vast majority of discordant cases. We discuss implications for practitioner workflows and identify critical gaps in current detection capabilities.

How Effective Are Publicly Accessible Deepfake Detection Tools? A Comparative Evaluation of Open-Source and Free-to-Use Platforms

TL;DR

This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools and AI-based classifiers, and reports three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern.

Abstract

The proliferation of deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity. While detection algorithm research is abundant, empirical evaluations of publicly accessible tools that practitioners actually use remain scarce. This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools (InVID \& WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). Both tool categories were evaluated by professional investigators with law enforcement experience using blinded protocols across datasets comprising authentic, tampered, and AI-generated images sourced from DF40, CelebDF, and CASIA-v2. We report three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern; human evaluators substantially outperform all automated tools; and human-AI disagreement is asymmetric, with human judgment prevailing in the vast majority of discordant cases. We discuss implications for practitioner workflows and identify critical gaps in current detection capabilities.
Paper Structure (56 sections, 12 figures, 18 tables)

This paper contains 56 sections, 12 figures, 18 tables.

Figures (12)

  • Figure 1: Example of a tampered image created via copy-move manipulation.
  • Figure 2: Example of a deepfake image generated using a diffusion model.
  • Figure 3: InVID & WeVerify analysis of a real image. ELA and multiple compression detectors converge on false suspicion driven by natural colour contrast.
  • Figure 4: Forensically clone detection on a real image. Recurring architectural textures produce excessive false clone matches.
  • Figure 5: FotoForensics ELA analysis of a fake AI-generated image. The featureless ELA map reveals no detectable manipulation traces.
  • ...and 7 more figures