Table of Contents
Fetching ...

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Yujie Zhou, Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Qidong Huang, Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Anyi Rao, Jiaqi Wang, Li Niu

TL;DR

Light-A-Video tackles training-free video relighting by transforming frame-wise image relighting with temporal synchronization. It introduces Consistent Light Attention (CLA) to stabilize cross-frame lighting and Progressive Light Fusion (PLF) to smoothly inject lighting changes guided by video diffusion priors. The approach does not require additional training and is compatible with various backbones, delivering improved temporal coherence and relighted-frame fidelity. Experiments demonstrate reduced flicker and coherent lighting transitions across diverse videos.

Abstract

Recent advancements in image relighting models, driven by large-scale datasets and pre-trained diffusion models, have enabled the imposition of consistent lighting. However, video relighting still lags, primarily due to the excessive training costs and the scarcity of diverse, high-quality video relighting datasets. A simple application of image relighting models on a frame-by-frame basis leads to several issues: lighting source inconsistency and relighted appearance inconsistency, resulting in flickers in the generated videos. In this work, we propose Light-A-Video, a training-free approach to achieve temporally smooth video relighting. Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency. First, we design a Consistent Light Attention (CLA) module, which enhances cross-frame interactions within the self-attention layers of the image relight model to stabilize the generation of the background lighting source. Second, leveraging the physical principle of light transport independence, we apply linear blending between the source video's appearance and the relighted appearance, using a Progressive Light Fusion (PLF) strategy to ensure smooth temporal transitions in illumination. Experiments show that Light-A-Video improves the temporal consistency of relighted video while maintaining the relighted image quality, ensuring coherent lighting transitions across frames. Project page: https://bujiazi.github.io/light-a-video.github.io/.

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

TL;DR

Light-A-Video tackles training-free video relighting by transforming frame-wise image relighting with temporal synchronization. It introduces Consistent Light Attention (CLA) to stabilize cross-frame lighting and Progressive Light Fusion (PLF) to smoothly inject lighting changes guided by video diffusion priors. The approach does not require additional training and is compatible with various backbones, delivering improved temporal coherence and relighted-frame fidelity. Experiments demonstrate reduced flicker and coherent lighting transitions across diverse videos.

Abstract

Recent advancements in image relighting models, driven by large-scale datasets and pre-trained diffusion models, have enabled the imposition of consistent lighting. However, video relighting still lags, primarily due to the excessive training costs and the scarcity of diverse, high-quality video relighting datasets. A simple application of image relighting models on a frame-by-frame basis leads to several issues: lighting source inconsistency and relighted appearance inconsistency, resulting in flickers in the generated videos. In this work, we propose Light-A-Video, a training-free approach to achieve temporally smooth video relighting. Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency. First, we design a Consistent Light Attention (CLA) module, which enhances cross-frame interactions within the self-attention layers of the image relight model to stabilize the generation of the background lighting source. Second, leveraging the physical principle of light transport independence, we apply linear blending between the source video's appearance and the relighted appearance, using a Progressive Light Fusion (PLF) strategy to ensure smooth temporal transitions in illumination. Experiments show that Light-A-Video improves the temporal consistency of relighted video while maintaining the relighted image quality, ensuring coherent lighting transitions across frames. Project page: https://bujiazi.github.io/light-a-video.github.io/.

Paper Structure

This paper contains 19 sections, 14 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Training-free video relighting. Equipped with an image relighting model (e.g., IC-Light zhang2025scaling) and a video diffusion model (e.g., CogVideoX yang2024cogvideox and AnimateDiff guo2023animatediff), Light-A-Video enables training-free video relighting for given video sequences or foreground sequences.
  • Figure 2: Relighted frames of vanilla IC-Light and "IC-Light + CLA" . The line chart depicts the average optical flow intensity between adjacent frames. Since IC-Light performs image relighting based on each independent frame, its results show a noticeable jitter between frames, especially in the generated background lighting. Conversely, the proposed CLA facilitates consistent lighting generation by forcing interaction between frames.
  • Figure 3: The pipeline of Light-A-Video. A source video is first noised and processed through the VDM for denoising across $T_m$ steps. At each step, the predicted noise-free component with details compensation serves as the Consistent Target $\mathbf{z}^{v}_{0 \gets t}$, inherently representing the VDM's denoising direction. Consistent Light Attention infuses $\mathbf{z}^{v}_{0 \gets t}$ with unique lighting information, transforming it into the Relight Target $\mathbf{z}^{r}_{0 \gets t}$. The Progressive Light Fusion strategy then merges two targets to form the Fusion Target $\tilde{\mathbf{z}}_{0 \gets t}$, which provides a refined direction for the current step.The bottom-right part illustrates the iterative evolution of $\mathbf{z}^{v}_{0 \gets t}$.
  • Figure 4: Visualization of the PLF strategy. During the denoising process of the VDM, the PLF strategy progressively replaces the original Consistent Target $\mathbf{z}^{v}_{0 \gets t}$ with the Fusion Target $\tilde{\mathbf{z}}_{0 \gets t}$, guiding the denoising direction from $\mathbf{v}_t$ to $\tilde{\mathbf{v}}_t$.
  • Figure 5: Visualization of the detail compensation. $\Delta d_m$ records the difference between $\hat{\mathbf{z}}_{0 \gets m}$ and the source video in the first denoising step, which is used as a detail compensation component for detail preservation in the consistent target.
  • ...and 10 more figures