Table of Contents
Fetching ...

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal, Siddharth Roheda

Abstract

The rapid progress of image-to-video (I2V) generation models has introduced significant risks by enabling deceptive or malicious video synthesis from a single image. Prior defenses such as I2VGuard attempt to immunize images by inducing spatio-temporal degradation, which does not necessarily provide meaningful protection, since residual motion can still convey malicious intent. In this work, we introduce Vid-Freeze -- a novel adversarial defense that adds imperceptible perturbations to enforce temporal freezing in generated videos. Our method explicitly targets attention dynamics in I2V models to suppress motion synthesis. As a result, immunized images produce standstill or near-static videos, effectively blocking malicious content generation. Experiments demonstrate strong protection across models and support temporal freezing as a promising direction for proactive and meaningful defense against I2V misuse.

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Abstract

The rapid progress of image-to-video (I2V) generation models has introduced significant risks by enabling deceptive or malicious video synthesis from a single image. Prior defenses such as I2VGuard attempt to immunize images by inducing spatio-temporal degradation, which does not necessarily provide meaningful protection, since residual motion can still convey malicious intent. In this work, we introduce Vid-Freeze -- a novel adversarial defense that adds imperceptible perturbations to enforce temporal freezing in generated videos. Our method explicitly targets attention dynamics in I2V models to suppress motion synthesis. As a result, immunized images produce standstill or near-static videos, effectively blocking malicious content generation. Experiments demonstrate strong protection across models and support temporal freezing as a promising direction for proactive and meaningful defense against I2V misuse.

Paper Structure

This paper contains 21 sections, 8 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Noise predictions across sampling steps ($0\leq k < 25$) for a given frame. Predictions at early sampling steps encode prominent motion-defining structure and progressively become noisier, especially towards the end.
  • Figure 2: Sanity check of the hypothesis. Top rows show motion under the standard pipeline. Bottom rows show inference with temporal attention hard-coded so that first-column entries are set to 1, yielding static videos.
  • Figure 3: Overview of Vid-Freeze. Given an input image, Vid-Freeze optimizes a small adversarial perturbation under a bounded pixel budget so that the perturbed image drives the I2V denoiser toward a temporal-freezing behavior. During optimization, we extract intermediate attention maps and apply a freezing objective $\mathcal{L}_{freeze}$ that suppresses cross-frame motion propagation, causing later frames to collapse toward the first frame.
  • Figure 4: Qualitative results comparing immunization strategies on CogVideoX. Vid-Freeze produces frozen videos with near-identical frames, whereas I2VGuard produces a spatio-temporally degraded video that may still harm victims.
  • Figure 5: Qualitative results comparing immunization strategies on SVD. Vid-Freeze produces frozen videos with near-identical frames, whereas I2VGuard produces a spatio-temporally degraded video that may still harm victims.
  • ...and 5 more figures