TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

Yang Liu; Chuanchen Luo; Zimo Tang; Yingyan Li; Yuran Yang; Yuanyong Ning; Lue Fan; Junran Peng; Zhaoxiang Zhang

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

Yang Liu, Chuanchen Luo, Zimo Tang, Yingyan Li, Yuran Yang, Yuanyong Ning, Lue Fan, Junran Peng, Zhaoxiang Zhang

TL;DR

TC-Light presents a temporally coherent relighting framework for long, dynamic videos by inflating a strong image relighting model to video space and applying a two-stage post-optimization. The core is a canonical Unique Video Tensor (UVT) representation that compresses spatiotemporal information and enables efficient, coherent optimization, augmented by a decayed multi-axis denoising approach during video-space diffusion. Stage I exposure alignment and Stage II UVT refinement jointly reduce illumination and texture flicker, delivering physically plausible results at low computational cost. The method achieves state-of-the-art temporal coherence on a challenging long-video benchmark and shows strong performance across synthetic and real-world scenarios, with potential impact for sim2real, real2real, and embodied AI data generation.

Abstract

Illumination and texture editing are critical dimensions for world-to-world transfer, which is valuable for applications including sim2real and real2real visual data scaling up for embodied AI. Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models. Nevertheless, these models are predominantly limited to the domain of training data (e.g., portrait) or fall into the bottleneck of temporal consistency and computation efficiency, especially when the input video involves complex dynamics and long durations. In this paper, we propose TC-Light, a novel generative renderer to overcome these problems. Starting from the video preliminarily relighted by an inflated video relighting model, it optimizes appearance embedding in the first stage to align global illumination. Then it optimizes the proposed canonical video representation, i.e., Unique Video Tensor (UVT), to align fine-grained texture and lighting in the second stage. To comprehensively evaluate performance, we also establish a long and highly dynamic video benchmark. Extensive experiments show that our method enables physically plausible re-rendering results with superior temporal coherence and low computation cost. The code and video demos are available at https://dekuliutesla.github.io/tclight/.

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

TL;DR

Abstract

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)