Table of Contents
Fetching ...

FlowPortal: Residual-Corrected Flow for Training-Free Video Relighting and Background Replacement

Wenshuo Gao, Junyi Fan, Jiangyue Zeng, Shuai Yang

TL;DR

FlowPortal addresses the challenge of training-free video relighting with background replacement by introducing a Residual-Corrected Flow that enforces Condition Consistency, ensuring perfect reconstruction when conditions are identical and faithful directional edits when they differ. It combines a Decoupled Condition Design, High-Frequency Transfer, and a masking mechanism to separate foreground relighting from background generation, achieving temporal coherence, structural fidelity, and natural illumination while remaining efficient ($V_t^{\text{edit}} = V_t^{\text{tar}} + V_t^{\text{res}}$). The approach outperforms state-of-the-art training-free and many training-based methods across video–text alignment, temporal smoothness, and detail/structure metrics, as evidenced by quantitative results and user studies. This work offers a practical, inversion-free editing pipeline for real-world video workflows and lays groundwork for broader video editing tasks beyond relighting and background replacement.

Abstract

Video relighting with background replacement is a challenging task critical for applications in film production and creative media. Existing methods struggle to balance temporal consistency, spatial fidelity, and illumination naturalness. To address these issues, we introduce FlowPortal, a novel training-free flow-based video relighting framework. Our core innovation is a Residual-Corrected Flow mechanism that transforms a standard flow-based model into an editing model, guaranteeing perfect reconstruction when input conditions are identical and enabling faithful relighting when they differ, resulting in high structural consistency. This is further enhanced by a Decoupled Condition Design for precise lighting control and a High-Frequency Transfer mechanism for detail preservation. Additionally, a masking strategy isolates foreground relighting from background pure generation process. Experiments demonstrate that FlowPortal achieves superior performance in temporal coherence, structural preservation, and lighting realism, while maintaining high efficiency. Project Page: https://gaowenshuo.github.io/FlowPortalProject/.

FlowPortal: Residual-Corrected Flow for Training-Free Video Relighting and Background Replacement

TL;DR

FlowPortal addresses the challenge of training-free video relighting with background replacement by introducing a Residual-Corrected Flow that enforces Condition Consistency, ensuring perfect reconstruction when conditions are identical and faithful directional edits when they differ. It combines a Decoupled Condition Design, High-Frequency Transfer, and a masking mechanism to separate foreground relighting from background generation, achieving temporal coherence, structural fidelity, and natural illumination while remaining efficient (). The approach outperforms state-of-the-art training-free and many training-based methods across video–text alignment, temporal smoothness, and detail/structure metrics, as evidenced by quantitative results and user studies. This work offers a practical, inversion-free editing pipeline for real-world video workflows and lays groundwork for broader video editing tasks beyond relighting and background replacement.

Abstract

Video relighting with background replacement is a challenging task critical for applications in film production and creative media. Existing methods struggle to balance temporal consistency, spatial fidelity, and illumination naturalness. To address these issues, we introduce FlowPortal, a novel training-free flow-based video relighting framework. Our core innovation is a Residual-Corrected Flow mechanism that transforms a standard flow-based model into an editing model, guaranteeing perfect reconstruction when input conditions are identical and enabling faithful relighting when they differ, resulting in high structural consistency. This is further enhanced by a Decoupled Condition Design for precise lighting control and a High-Frequency Transfer mechanism for detail preservation. Additionally, a masking strategy isolates foreground relighting from background pure generation process. Experiments demonstrate that FlowPortal achieves superior performance in temporal coherence, structural preservation, and lighting realism, while maintaining high efficiency. Project Page: https://gaowenshuo.github.io/FlowPortalProject/.

Paper Structure

This paper contains 23 sections, 17 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: We propose a novel training-free FlowPortal framework for efficient video background replacement and foreground relighting.
  • Figure 2: Illustration of the proposed Residual-Corrected Flow. (a) The Naive Edit Flow builds denoising trajectories under source and target conditions using the same noise $\epsilon$. When applied to a real input video $z_0$, the mismatch between $z_0^{\text{src}}$ and $z_0$ violates Stability under Identity. (b) Consistency Residual Velocity$V_t^{\text{res}}$ is constructed as the difference between the ideal restoration path $V_0$ and the predicted source flow $V_t^{\text{src}}$, aligning the generated $z_0^{\text{src}}$ with $z_0$. (c) Residual-Corrected Flow combines $V_t^{\text{tar}}$ and $V_t^{\text{res}}$ to perform reliable video relighting that preserves identity consistency (e.g., mouth shape, glasses reflection) while enabling directional condition change. The purple arrows indicate the condition to guide the velocity calculation. For simplicity, we omit the reference frame and structural conditions.
  • Figure 3: Decoupled Condition Design. The source and target conditions share identical illumination-agnostic information, differing only in their illumination-specific information.
  • Figure 4: Comparison with FlowEdit. (a) Input video. (b) FlowEdit produces blurry outputs, and suffers from ghosting artifacts due to the interference from the original background (yellow region). (c)(d) Our residual reusing strategy effectively reduces the number of prediction steps with negligible quality degradation.
  • Figure 5: Qualitative comparison. The training-based Lumen exhibits insufficient lighting richness and diversity, with foreground nearly unaltered. The training-free AnyPortal and Light-A-Video show poor structural fidelity and lighting quality. Our method not only maintains structural and detail consistency but also demonstrates high-quality background generation and rich relighting effects.
  • ...and 9 more figures