Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng, Shunsi Zhang, Yingcong Chen
TL;DR
Uni-Renderer addresses the ill-posed dual task of rendering and inverse rendering by unifying them into a single dual-stream diffusion framework. It uses two distinct timesteps and a cycle-consistent loss to enable cross-conditioning between intrinsic properties and rendered images, mitigating ambiguity in intrinsic decomposition. The method is trained on a large synthetic dataset with varied metallic, roughness, and lighting attributes and demonstrates strong performance in both rendering and inverse rendering, including relighting and real-world generalization, with ablations showing the benefits of a unified model and cycle-consistency. This approach offers a practical pathway to robust intrinsic property disentanglement and photo-realistic rendering in real-world scenarios, while acknowledging domain gap as a future area for real-world data integration.
Abstract
Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. Despite achieving promising results of existing rendering methods, they merely approximate the ideal estimation for a specific scene and come with a high computational cost. Additionally, the inverse conditional distribution transfer is intractable due to the inherent ambiguity. To address these challenges, we propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks within a single diffusion framework. Inspired by UniDiffuser, we utilize two distinct time schedules to model both tasks, and with a tailored dual streaming module, we achieve cross-conditioning of two pre-trained diffusion models. This unified approach, named Uni-Renderer, allows the two processes to facilitate each other through a cycle-consistent constrain, mitigating ambiguity by enforcing consistency between intrinsic properties and rendered images. Combined with a meticulously prepared dataset, our method effectively decomposition of intrinsic properties and demonstrates a strong capability to recognize changes during rendering. We will open-source our training and inference code to the public, fostering further research and development in this area.
