Table of Contents
Fetching ...

UVCG: Leveraging Temporal Consistency for Universal Video Protection

KaiZhou Li, Jindong Gu, Xinchun Yu, Junjie Cao, Yansong Tang, Xiao-Ping Zhang

TL;DR

UVCG embeds the content of another video(target video) within a protected video by introducing continuous, imperceptible perturbations which has the ability to force the encoder of editing models to map continuous inputs to misaligned continuous outputs, thereby inhibiting the generation of videos consistent with the intended textual prompts.

Abstract

The security risks of AI-driven video editing have garnered significant attention. Although recent studies indicate that adding perturbations to images can protect them from malicious edits, directly applying image-based methods to perturb each frame in a video becomes ineffective, as video editing techniques leverage the consistency of inter-frame information to restore individually perturbed content. To address this challenge, we leverage the temporal consistency of video content to propose a straightforward and efficient, yet highly effective and broadly applicable approach, Universal Video Consistency Guard (UVCG). UVCG embeds the content of another video(target video) within a protected video by introducing continuous, imperceptible perturbations which has the ability to force the encoder of editing models to map continuous inputs to misaligned continuous outputs, thereby inhibiting the generation of videos consistent with the intended textual prompts. Additionally leveraging similarity in perturbations between adjacent frames, we improve the computational efficiency of perturbation generation by employing a perturbation-reuse strategy. We applied UVCG across various versions of Latent Diffusion Models (LDM) and assessed its effectiveness and generalizability across multiple LDM-based editing pipelines. The results confirm the effectiveness, transferability, and efficiency of our approach in safeguarding video content from unauthorized modifications.

UVCG: Leveraging Temporal Consistency for Universal Video Protection

TL;DR

UVCG embeds the content of another video(target video) within a protected video by introducing continuous, imperceptible perturbations which has the ability to force the encoder of editing models to map continuous inputs to misaligned continuous outputs, thereby inhibiting the generation of videos consistent with the intended textual prompts.

Abstract

The security risks of AI-driven video editing have garnered significant attention. Although recent studies indicate that adding perturbations to images can protect them from malicious edits, directly applying image-based methods to perturb each frame in a video becomes ineffective, as video editing techniques leverage the consistency of inter-frame information to restore individually perturbed content. To address this challenge, we leverage the temporal consistency of video content to propose a straightforward and efficient, yet highly effective and broadly applicable approach, Universal Video Consistency Guard (UVCG). UVCG embeds the content of another video(target video) within a protected video by introducing continuous, imperceptible perturbations which has the ability to force the encoder of editing models to map continuous inputs to misaligned continuous outputs, thereby inhibiting the generation of videos consistent with the intended textual prompts. Additionally leveraging similarity in perturbations between adjacent frames, we improve the computational efficiency of perturbation generation by employing a perturbation-reuse strategy. We applied UVCG across various versions of Latent Diffusion Models (LDM) and assessed its effectiveness and generalizability across multiple LDM-based editing pipelines. The results confirm the effectiveness, transferability, and efficiency of our approach in safeguarding video content from unauthorized modifications.

Paper Structure

This paper contains 17 sections, 5 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: Overview of Framework. An attacker can modify the content of a video according to their intent by using textual descriptions and then generate malicious videos through any video editing pipeline (top). We can immunize the video by introducing imperceptible perturbations, thereby disrupting their ability to perform such edits (bottom).
  • Figure 2: Right: Overview of UVCG. When applying UVCG, our goal is to map the continuous representations of original video to the continuous representations of target video. Left: Feature Transfer. The top represents the feature space of the target video, while the bottom represents the feature space of the original video. By adding continuous perturbations, we guide the feature space of the original video towards that of the target video (indicated by the dashed line in the figure).
  • Figure 3: Protection Effectiveness on Tokenflow. The base model used for editing the video on the left is SD-v2.1, while the one used for the right-side video is SD-v1.5. First row: The original video. Second row: The target video. Third row: The immunized video. Fourth row: editing results without immunization. Fifth row: Editing results after applying UVCG with SD-v1.4 as the protection model. Sixth row: Editing results after UVCG using SD-v2.1 as the protection model.
  • Figure 4: Protection Effectiveness on Text2Video-zero. The base model employed for editing in Text2Video-zero is Instruct-pix2pixinstructpix2pix. First row: The original video. Second row: The target video. Third row: The immunized video. Fourth row: editing results without immunization. Fifth row: Editing results after applying UVCG with SD-v1.4 as the protection model. Sixth row: Editing results after UVCG using SD-v2.1 as the protection model.
  • Figure 5: GPU time consumption. Protecting a frame video on an NVIDIA RTX A6000 takes 20,500 seconds with Photoguard, 1,700 seconds with PRIMEprime, and 2,100 seconds with UVCG.
  • ...and 10 more figures