Dormant: Defending against Pose-driven Human Image Animation
Jiachen Zhou, Mingsi Wang, Tianlin Li, Guozhu Meng, Kai Chen
TL;DR
Dormant tackles the misuse risk of pose-driven human image animation by applying a visually imperceptible perturbation to a single image, leading to degraded, inconsistent, and less recognizable video outputs when used as input. It introduces a four-term objective that targets both misextraction of appearance features and frame coherence (via $\mathcal{L}_{vae}$, $\mathcal{L}_{feature}$, and $\mathcal{L}_{frame}$) while enforcing perceptual similarity with $\mathcal{L}_{lpips}$, optimized through PGD under a black-box threat model with surrogate models like CLIP and ReferenceNet. The approach demonstrates strong protection across eight animation methods and four datasets, with robust transferability to image-to-video, image-to-image tasks, and real-world commercial services, and shows resilience under transformations and several purification attempts. Overall, Dormant provides a practical, transferable safeguard for portrait rights against state-of-the-art pose-driven video generation, with open-source artifacts for reproducibility and further research.
Abstract
Pose-driven human image animation has achieved tremendous progress, enabling the generation of vivid and realistic human videos from just one single photo. However, it conversely exacerbates the risk of image misuse, as attackers may use one available image to create videos involving politics, violence, and other illegal content. To counter this threat, we propose Dormant, a novel protection approach tailored to defend against pose-driven human image animation techniques. Dormant applies protective perturbation to one human image, preserving the visual similarity to the original but resulting in poor-quality video generation. The protective perturbation is optimized to induce misextraction of appearance features from the image and create incoherence among the generated video frames. Our extensive evaluation across 8 animation methods and 4 datasets demonstrates the superiority of Dormant over 6 baseline protection methods, leading to misaligned identities, visual distortions, noticeable artifacts, and inconsistent frames in the generated videos. Moreover, Dormant shows effectiveness on 6 real-world commercial services, even with fully black-box access.
