ReLumix: Extending Image Relighting to Video via Video Diffusion Models
Lezhong Wang, Shutong Jin, Ruiqi Cui, Anders Bjorholm Dahl, Jeppe Revall Frisvad, Siavash Bigdeli
TL;DR
ReLumix addresses the challenge of controllable lighting in video by decoupling relighting from temporal propagation. It enables any image-based relighting technique to be applied to video via a two-stage pipeline: relight a reference frame using a preferred method, then propagate the lighting across frames with a fine-tuned stable video diffusion model. Key innovations—embedding fusion, gated cross-attention, and temporal bootstrapping—enable mask-free, coherent illumination transfer learned from synthetic data with strong sim-to-real generalization, achieving significant speedups over frame-inversion baselines. The approach demonstrates high fidelity and temporal stability on CARLA and DAVIS datasets, offering a flexible, scalable solution for dynamic lighting control in practical video editing workflows.
Abstract
Controlling illumination during video post-production is a crucial yet elusive goal in computational photography. Existing methods often lack flexibility, restricting users to certain relighting models. This paper introduces ReLumix, a novel framework that decouples the relighting algorithm from temporal synthesis, thereby enabling any image relighting technique to be seamlessly applied to video. Our approach reformulates video relighting into a simple yet effective two-stage process: (1) an artist relights a single reference frame using any preferred image-based technique (e.g., Diffusion Models, physics-based renderers); and (2) a fine-tuned stable video diffusion (SVD) model seamlessly propagates this target illumination throughout the sequence. To ensure temporal coherence and prevent artifacts, we introduce a gated cross-attention mechanism for smooth feature blending and a temporal bootstrapping strategy that harnesses SVD's powerful motion priors. Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos. The method demonstrates significant improvements in visual fidelity, offering a scalable and versatile solution for dynamic lighting control.
