Dormant: Defending against Pose-driven Human Image Animation

Jiachen Zhou; Mingsi Wang; Tianlin Li; Guozhu Meng; Kai Chen

Dormant: Defending against Pose-driven Human Image Animation

Jiachen Zhou, Mingsi Wang, Tianlin Li, Guozhu Meng, Kai Chen

TL;DR

Dormant tackles the misuse risk of pose-driven human image animation by applying a visually imperceptible perturbation to a single image, leading to degraded, inconsistent, and less recognizable video outputs when used as input. It introduces a four-term objective that targets both misextraction of appearance features and frame coherence (via $\mathcal{L}_{vae}$, $\mathcal{L}_{feature}$, and $\mathcal{L}_{frame}$) while enforcing perceptual similarity with $\mathcal{L}_{lpips}$, optimized through PGD under a black-box threat model with surrogate models like CLIP and ReferenceNet. The approach demonstrates strong protection across eight animation methods and four datasets, with robust transferability to image-to-video, image-to-image tasks, and real-world commercial services, and shows resilience under transformations and several purification attempts. Overall, Dormant provides a practical, transferable safeguard for portrait rights against state-of-the-art pose-driven video generation, with open-source artifacts for reproducibility and further research.

Abstract

Pose-driven human image animation has achieved tremendous progress, enabling the generation of vivid and realistic human videos from just one single photo. However, it conversely exacerbates the risk of image misuse, as attackers may use one available image to create videos involving politics, violence, and other illegal content. To counter this threat, we propose Dormant, a novel protection approach tailored to defend against pose-driven human image animation techniques. Dormant applies protective perturbation to one human image, preserving the visual similarity to the original but resulting in poor-quality video generation. The protective perturbation is optimized to induce misextraction of appearance features from the image and create incoherence among the generated video frames. Our extensive evaluation across 8 animation methods and 4 datasets demonstrates the superiority of Dormant over 6 baseline protection methods, leading to misaligned identities, visual distortions, noticeable artifacts, and inconsistent frames in the generated videos. Moreover, Dormant shows effectiveness on 6 real-world commercial services, even with fully black-box access.

Dormant: Defending against Pose-driven Human Image Animation

TL;DR

, and

) while enforcing perceptual similarity with

, optimized through PGD under a black-box threat model with surrogate models like CLIP and ReferenceNet. The approach demonstrates strong protection across eight animation methods and four datasets, with robust transferability to image-to-video, image-to-image tasks, and real-world commercial services, and shows resilience under transformations and several purification attempts. Overall, Dormant provides a practical, transferable safeguard for portrait rights against state-of-the-art pose-driven video generation, with open-source artifacts for reproducibility and further research.

Abstract

Paper Structure (31 sections, 13 equations, 21 figures, 7 tables, 1 algorithm)

This paper contains 31 sections, 13 equations, 21 figures, 7 tables, 1 algorithm.

Introduction
Background
Latent Diffusion Model
LDM for Human Image Animation
Protection Methods against LDM
Methodology
Threat Model
Feature Misextraction
Frame Incoherence
Dormant
Evaluation
Experimental Setup
Protection Performance
Human and GPT-4o Studies
Protection Robustness
...and 16 more sections

Figures (21)

Figure 1: Illustration of defense against pose-driven human image animation. Generated video from the protected image displays mismatched identities and distorted backgrounds.
Figure 2: Overview of Dormant. Here, we present the four components of our proposed objective function $\mathcal{L}_{\textsc{Dormant}}$, which includes: $\mathcal{L}_{vae}$ and $\mathcal{L}_{feature}$ for feature misextraction, $\mathcal{L}_{frame}$ for frame incoherence, and $\mathcal{L}_{lpips}$ for visual similarity.
Figure 3: Qualitative comparisons with baseline protections against various pose-driven human image animation methods.
Figure 4: Qualitative results on various datasets.
Figure 5: Protection robustness of Dormant against various transformations under different parameter settings.
...and 16 more figures

Dormant: Defending against Pose-driven Human Image Animation

TL;DR

Abstract

Dormant: Defending against Pose-driven Human Image Animation

Authors

TL;DR

Abstract

Table of Contents

Figures (21)