GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

Govind Mittal; Chinmay Hegde; Nasir Memon

GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

Govind Mittal, Chinmay Hegde, Nasir Memon

TL;DR

GOTCHA tackles the problem of authenticating live video interactions against Real-Time Deepfakes by introducing a challenge-response framework that elicits detectable artifacts in RTDF outputs. It combines a taxonomy of facial challenges with a large in-person dataset and both human and automated evaluation to demonstrate that carefully designed tasks can reveal deepfakes in real time, typically within about 15 seconds. A fidelity-score model based on a 3D‑CNN and contrastive learning, together with challenge-specific compliance detectors, achieves an automated AUC of approximately 80.1% while human evaluators reach about 88.6% AUC, underscoring the method’s interpretability and scalability. The work also analyzes adaptive adversaries and usability tradeoffs, and releases data and code to support reproducibility and further research in practical, explainable real-time deepfake defenses.

Abstract

With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs. To bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. We evaluate representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6% and 80.1% AUC, respectively. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios. We provide access to data and code at \url{https://github.com/mittalgovind/GOTCHA-Deepfakes}.

GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

TL;DR

Abstract

Paper Structure (22 sections, 5 equations, 19 figures, 4 tables)

This paper contains 22 sections, 5 equations, 19 figures, 4 tables.

Introduction
Real-Time Deepfake Generation
Dissecting the Generation Pipeline
Hurdles to Generating Realistic Deepfakes
Problem Description
Challenges
A Taxonomy of Facial Challenges
Dataset Collection and Curation
Evaluation
Human Evaluation
Automated Evaluation
Defenses against Real-Time Deepfakes
Limitation of Imposters
Countermeasures
Usability
...and 7 more sections

Figures (19)

Figure 1: A generic face-swapping RTDF pipeline containing a physical webcam (top), face and landmark detector, face-swapper (auto-encoder), blending operator and a virtual webcam (right). The virtual webcam is piped into a video conferencing software (not shown). Arrows indicate relevant data flows.
Figure 2: Importance of facial shape similarity. Two imposters with distinct facial shapes result in differing quality outputs, while assuming Ryan Reynolds as target. Qualitative observations imply that better match yields a better fit.
Figure 3: A method to guide a user and randomize the challenge is illustrated. The user follows on-screen instructions to mimic the actions of an avatar, performing required head movements.
Figure 4: Challenge frame of original and deepfake videos. Each row aligns outputs against the same instance of challenge, while each column aligns the same deepfake method. The green bars are a metaphor for the fidelity score, with taller bars suggesting higher fidelity. Missing bars imply the specific deepfake failed to do that specific challenge. Video version at http://govindm.me/gotcha-figures.
Figure 5: Artifacts defined for human evaluation. (a) has a boundary artifact near the right brow, in (b) the hand vanishes behind the face, (c) has hazy face detail, and (d) the sunglass starts vanishing briefly. Red objects indicate artifact locations. Video version at http://govindm.me/gotcha-figures/.
...and 14 more figures

GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

TL;DR

Abstract

GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

Authors

TL;DR

Abstract

Table of Contents

Figures (19)