Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing
Danial Samadi Vahdati, Tai Duc Nguyen, Ekta Prashnani, Koki Nagano, David Luebke, Orazio Gallo, Matthew Stamm
TL;DR
This work tackles puppeteering in AI-based talking-head videoconferencing by exploiting biometric leakage in pose-expression latents. It introduces Enhanced Biometric Leakage (EBL) space learned with a pose-conditioned large-margin cosine loss (PC-LMCL) and uses a temporal LSTM to fuse evidence for real-time detection, all operating without RGB reconstruction or enrollment. Across fifteen generator/dataset combinations, the method achieves state-of-the-art detection (AUC > 0.97 on combined data; ~0.925 in cross-domain settings) and generalizes well to unseen domains while maintaining real-time performance on consumer-grade GPUs. This approach provides a practical, enrollment-free safeguard that strengthens trust in bandwidth-efficient videoconferencing by authenticating driving versus target identities entirely in latent space.
Abstract
AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression latent and re-synthesizing RGB at the receiver, but this latent can be puppeteered, letting an attacker hijack a victim's likeness in real time. Because every frame is synthetic, deepfake and synthetic video detectors fail outright. To address this security problem, we exploit a key observation: the pose-expression latent inherently contains biometric information of the driving identity. Therefore, we introduce the first biometric leakage defense without ever looking at the reconstructed RGB video: a pose-conditioned, large-margin contrastive encoder that isolates persistent identity cues inside the transmitted latent while cancelling transient pose and expression. A simple cosine test on this disentangled embedding flags illicit identity swaps as the video is rendered. Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios.
