Table of Contents
Fetching ...

Robust and Calibrated Detection of Authentic Multimedia Content

Sarim Hashmi, Abdelrahman Elsayed, Mohammed Talha Alam, Samuele Poppi, Nils Lukas

TL;DR

The paper tackles the dual challenges of deepfake detectability: post-hoc indistinguishability and adversarial robustness. It replaces binary real/fake detection with a calibrated Authenticity Index built on reconstruction-free inversion to quantify how plausibly an image could be resynthesized by modern generators. Through differential-evolution–driven calibration and a rigorously defined threat model, the approach achieves high-precision authentication with controlled false positives and demonstrates robustness against adaptive adversaries across images and videos. Extensive experiments on multiple generators, a social-media corpus, and a video extension reveal stronger generalization and practical resilience compared to traditional detectors, while also highlighting the limits of current inversion-based methods and the need for model-aware calibration.

Abstract

Generative models can synthesize highly realistic content, so-called deepfakes, that are already being misused at scale to undermine digital media authenticity. Current deepfake detection methods are unreliable for two reasons: (i) distinguishing inauthentic content post-hoc is often impossible (e.g., with memorized samples), leading to an unbounded false positive rate (FPR); and (ii) detection lacks robustness, as adversaries can adapt to known detectors with near-perfect accuracy using minimal computational resources. To address these limitations, we propose a resynthesis framework to determine if a sample is authentic or if its authenticity can be plausibly denied. We make two key contributions focusing on the high-precision, low-recall setting against efficient (i.e., compute-restricted) adversaries. First, we demonstrate that our calibrated resynthesis method is the most reliable approach for verifying authentic samples while maintaining controllable, low FPRs. Second, we show that our method achieves adversarial robustness against efficient adversaries, whereas prior methods are easily evaded under identical compute budgets. Our approach supports multiple modalities and leverages state-of-the-art inversion techniques.

Robust and Calibrated Detection of Authentic Multimedia Content

TL;DR

The paper tackles the dual challenges of deepfake detectability: post-hoc indistinguishability and adversarial robustness. It replaces binary real/fake detection with a calibrated Authenticity Index built on reconstruction-free inversion to quantify how plausibly an image could be resynthesized by modern generators. Through differential-evolution–driven calibration and a rigorously defined threat model, the approach achieves high-precision authentication with controlled false positives and demonstrates robustness against adaptive adversaries across images and videos. Extensive experiments on multiple generators, a social-media corpus, and a video extension reveal stronger generalization and practical resilience compared to traditional detectors, while also highlighting the limits of current inversion-based methods and the need for model-aware calibration.

Abstract

Generative models can synthesize highly realistic content, so-called deepfakes, that are already being misused at scale to undermine digital media authenticity. Current deepfake detection methods are unreliable for two reasons: (i) distinguishing inauthentic content post-hoc is often impossible (e.g., with memorized samples), leading to an unbounded false positive rate (FPR); and (ii) detection lacks robustness, as adversaries can adapt to known detectors with near-perfect accuracy using minimal computational resources. To address these limitations, we propose a resynthesis framework to determine if a sample is authentic or if its authenticity can be plausibly denied. We make two key contributions focusing on the high-precision, low-recall setting against efficient (i.e., compute-restricted) adversaries. First, we demonstrate that our calibrated resynthesis method is the most reliable approach for verifying authentic samples while maintaining controllable, low FPRs. Second, we show that our method achieves adversarial robustness against efficient adversaries, whereas prior methods are easily evaded under identical compute budgets. Our approach supports multiple modalities and leverages state-of-the-art inversion techniques.

Paper Structure

This paper contains 27 sections, 16 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Conceptual illustration of the Authenticity Index. (A) Traditional post-hoc detectors separate real from fake using feature cues but struggle as generators improve. (B) Our method instead asks: can a generator resynthesize the query image $x$? The similarity $s(x,\tilde{x})$ between $x$ and its inversion $\tilde{x}$ is calibrated into the Authenticity Index, where high similarity implies plausible deniability and low similarity indicates likely authentic. This allows us to reliably identify authentic content the generator is unlikely to have generated.
  • Figure 2: Distribution of A-index$(x,\tilde{x})$ for fake image and real image.
  • Figure 3: Architecture of the Authenticity Index (A-Index). Given an input image $x$, a reconstruction-free inverter $G_e^{-1}$ produces an inverted reconstruction $\tilde{x}$. We then compute complementary similarities between $x$ and $\tilde{x}$: pixel fidelity(PSNR), structural fidelity(SSIM), perceptual distance($1{-}$LPIPS), and semantic consistency (CLIP cosine). A calibrated weighted combiner (learned $\alpha_1, \alpha_2, \alpha_3, \alpha_4$) produces a scalar $s$($x$, $\tilde{x}$), yielding the A-Index in $[0,1]$. A safety threshold $\tau$ certifies content as Authentic when $\text{A-Index} \geq \tau$, and otherwise labels it as Plausibly Deniable. We further analyze robustness by applying $\ell_\infty$-bounded perturbations $\delta$ (PGD-style) through $G_e^{-1}$ to maximize or minimize the A-Index.
  • Figure 4: Like most traditional methods, C2-CLIP fails to generalize, as real and fake prediction densities heavily overlap, indicating poor separability.
  • Figure 5: Robustness evaluation of the traditional D3 detector under PGD attack. Before the attack, the model confidently distinguishes between real and fake samples. However, after the adversarial perturbation, the distribution inverts completely, resulting in near-total misclassification , the detector is effectively and perfectly fooled.
  • ...and 11 more figures