Table of Contents
Fetching ...

RelightMaster: Precise Video Relighting with Multi-plane Light Images

Weikang Bian, Xiaoyu Shi, Zhaoyang Huang, Jianhong Bai, Qinghe Wang, Xintao Wang, Pengfei Wan, Kun Gai, Hongsheng Li

TL;DR

RelightMaster tackles precise, controllable video relighting by introducing a novel Multi-plane Light Image (MPLI) representation and a Light Image Adapter (LIA) that injects lighting into pre-trained Video Diffusion Transformers. It also provides RelightVideo, a Unreal Engine–based dataset of identical content under varied lighting to train and evaluate relighting. The approach enables dynamic multi-source lighting, temporally varying illumination, and preserves background content, showing superior performance over state-of-the-art methods. This work offers a practical pathway for accurate lighting control in video synthesis and editing while leveraging existing generative priors.

Abstract

Recent advances in diffusion models enable high-quality video generation and editing, but precise relighting with consistent video contents, which is critical for shaping scene atmosphere and viewer attention, remains unexplored. Mainstream text-to-video (T2V) models lack fine-grained lighting control due to text's inherent limitation in describing lighting details and insufficient pre-training on lighting-related prompts. Additionally, constructing high-quality relighting training data is challenging, as real-world controllable lighting data is scarce. To address these issues, we propose RelightMaster, a novel framework for accurate and controllable video relighting. First, we build RelightVideo, the first dataset with identical dynamic content under varying precise lighting conditions based on the Unreal Engine. Then, we introduce Multi-plane Light Image (MPLI), a novel visual prompt inspired by Multi-Plane Image (MPI). MPLI models lighting via K depth-aligned planes, representing 3D light source positions, intensities, and colors while supporting multi-source scenarios and generalizing to unseen light setups. Third, we design a Light Image Adapter that seamlessly injects MPLI into pre-trained Video Diffusion Transformers (DiT): it compresses MPLI via a pre-trained Video VAE and injects latent light features into DiT blocks, leveraging the base model's generative prior without catastrophic forgetting. Experiments show that RelightMaster generates physically plausible lighting and shadows and preserves original scene content. Demos are available at https://wkbian.github.io/Projects/RelightMaster/.

RelightMaster: Precise Video Relighting with Multi-plane Light Images

TL;DR

RelightMaster tackles precise, controllable video relighting by introducing a novel Multi-plane Light Image (MPLI) representation and a Light Image Adapter (LIA) that injects lighting into pre-trained Video Diffusion Transformers. It also provides RelightVideo, a Unreal Engine–based dataset of identical content under varied lighting to train and evaluate relighting. The approach enables dynamic multi-source lighting, temporally varying illumination, and preserves background content, showing superior performance over state-of-the-art methods. This work offers a practical pathway for accurate lighting control in video synthesis and editing while leveraging existing generative priors.

Abstract

Recent advances in diffusion models enable high-quality video generation and editing, but precise relighting with consistent video contents, which is critical for shaping scene atmosphere and viewer attention, remains unexplored. Mainstream text-to-video (T2V) models lack fine-grained lighting control due to text's inherent limitation in describing lighting details and insufficient pre-training on lighting-related prompts. Additionally, constructing high-quality relighting training data is challenging, as real-world controllable lighting data is scarce. To address these issues, we propose RelightMaster, a novel framework for accurate and controllable video relighting. First, we build RelightVideo, the first dataset with identical dynamic content under varying precise lighting conditions based on the Unreal Engine. Then, we introduce Multi-plane Light Image (MPLI), a novel visual prompt inspired by Multi-Plane Image (MPI). MPLI models lighting via K depth-aligned planes, representing 3D light source positions, intensities, and colors while supporting multi-source scenarios and generalizing to unseen light setups. Third, we design a Light Image Adapter that seamlessly injects MPLI into pre-trained Video Diffusion Transformers (DiT): it compresses MPLI via a pre-trained Video VAE and injects latent light features into DiT blocks, leveraging the base model's generative prior without catastrophic forgetting. Experiments show that RelightMaster generates physically plausible lighting and shadows and preserves original scene content. Demos are available at https://wkbian.github.io/Projects/RelightMaster/.

Paper Structure

This paper contains 13 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: Dataset Overview. (a) and (b) show the assets used in our relighting datasets, including the 3D scenes, human models, and animations. (c) demonstrates an example lighting configuration. For each scene that has been set up, denoted as w/o light, we sample multiple camera trajectories and additional light sources to render video editing pairs with diverse motion and light conditions.
  • Figure 2: An overview of our relighting dataset. A Multi-plane Light Image (MPLI) contains 4 light images, and each MPLI is encoded as a latent light feature by the Video VAE. $N$ latent light features are passed to the DiT model via our proposed Light Image Adapter (LIA), which is initialized by the pretrained patchify module and shared across different DiT blocks. The original video and the noise are temporally concatenated. The parameters of the pretrained DiT model are frozen except the 3D attention layers. We also add a LoRA module after the 3D attention layer to learn the additional editing knowledge.
  • Figure 3: Relighting with fixed light source. (a) and (b) demonstrate the light source position, (c) reflects the light source intensity, and (d) indicates the light color.
  • Figure 4: Relighting with temporally-varying lights and multi-lights. Our RelightMaster supports multiple and temporally-varying light source control. The corresponding Multi-plane Light Images (MPLI) at different moments are visualized for better understanding.
  • Figure 5: Comparison with other video relighting methods. We translate our precise light control signals to text and feed them to Light-A-Video zhou2025light and TC-Light zhou2025light.
  • ...and 2 more figures