Table of Contents
Fetching ...

Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion

Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng, Shunsi Zhang, Yingcong Chen

TL;DR

Uni-Renderer addresses the ill-posed dual task of rendering and inverse rendering by unifying them into a single dual-stream diffusion framework. It uses two distinct timesteps and a cycle-consistent loss to enable cross-conditioning between intrinsic properties and rendered images, mitigating ambiguity in intrinsic decomposition. The method is trained on a large synthetic dataset with varied metallic, roughness, and lighting attributes and demonstrates strong performance in both rendering and inverse rendering, including relighting and real-world generalization, with ablations showing the benefits of a unified model and cycle-consistency. This approach offers a practical pathway to robust intrinsic property disentanglement and photo-realistic rendering in real-world scenarios, while acknowledging domain gap as a future area for real-world data integration.

Abstract

Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. Despite achieving promising results of existing rendering methods, they merely approximate the ideal estimation for a specific scene and come with a high computational cost. Additionally, the inverse conditional distribution transfer is intractable due to the inherent ambiguity. To address these challenges, we propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks within a single diffusion framework. Inspired by UniDiffuser, we utilize two distinct time schedules to model both tasks, and with a tailored dual streaming module, we achieve cross-conditioning of two pre-trained diffusion models. This unified approach, named Uni-Renderer, allows the two processes to facilitate each other through a cycle-consistent constrain, mitigating ambiguity by enforcing consistency between intrinsic properties and rendered images. Combined with a meticulously prepared dataset, our method effectively decomposition of intrinsic properties and demonstrates a strong capability to recognize changes during rendering. We will open-source our training and inference code to the public, fostering further research and development in this area.

Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion

TL;DR

Uni-Renderer addresses the ill-posed dual task of rendering and inverse rendering by unifying them into a single dual-stream diffusion framework. It uses two distinct timesteps and a cycle-consistent loss to enable cross-conditioning between intrinsic properties and rendered images, mitigating ambiguity in intrinsic decomposition. The method is trained on a large synthetic dataset with varied metallic, roughness, and lighting attributes and demonstrates strong performance in both rendering and inverse rendering, including relighting and real-world generalization, with ablations showing the benefits of a unified model and cycle-consistency. This approach offers a practical pathway to robust intrinsic property disentanglement and photo-realistic rendering in real-world scenarios, while acknowledging domain gap as a future area for real-world data integration.

Abstract

Rendering and inverse rendering are pivotal tasks in both computer vision and graphics. The rendering equation is the core of the two tasks, as an ideal conditional distribution transfer function from intrinsic properties to RGB images. Despite achieving promising results of existing rendering methods, they merely approximate the ideal estimation for a specific scene and come with a high computational cost. Additionally, the inverse conditional distribution transfer is intractable due to the inherent ambiguity. To address these challenges, we propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks within a single diffusion framework. Inspired by UniDiffuser, we utilize two distinct time schedules to model both tasks, and with a tailored dual streaming module, we achieve cross-conditioning of two pre-trained diffusion models. This unified approach, named Uni-Renderer, allows the two processes to facilitate each other through a cycle-consistent constrain, mitigating ambiguity by enforcing consistency between intrinsic properties and rendered images. Combined with a meticulously prepared dataset, our method effectively decomposition of intrinsic properties and demonstrates a strong capability to recognize changes during rendering. We will open-source our training and inference code to the public, fostering further research and development in this area.

Paper Structure

This paper contains 23 sections, 5 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Our framework, Uni-renderer, empowers the generative model to function both as a renderer and an inverse renderer by approximating the rendering equation using a data-driven approach. Given intrinsic attributes, Uni-renderer generates photo-realistic images, functioning as a renderer. When provided with a single RGB image, it effectively decomposes the intrinsic properties, functioning as an inverse renderer. Top: Uni-renderer generates smooth variations as renderer. Setting the roughness value to $1.0$ results in the "dog" case, shown at the top, lacking specular highlights. Conversely, setting the metallic value to 1 makes the "hat" case appear metallic. Bottom Left: When functioning as an inverse renderer, Uni-renderer decomposes the intrinsic properties of a single RGB image. Bottom Right: Uni-renderer generates relighting results under different environment lighitngs.
  • Figure 2: The overview of our pipeline. During training, both attribute and RGB images are input to a unified model with pre-trained VAE encoders. The timestep selector plays a crucial role by adjusting the timesteps for each branch. Specifically, it ensures that one branch (either the attribute or RGB) has a timestep of 0, while the other branch selects a timestep from $t \in [0, T]$. This mechanism allows our model to effectively learn the conditional distributions $q(\mathbf{x_0}|\mathbf{y_0})$, $q(\mathbf{y_0}|\mathbf{x_0})$ in alternating iterations. During rendering and inverse rendering, the corresponding conditions are input to the model with a timestep of 0, and the attributes/RGB images are generated through a sampled noise. (The VAE encoder and decoder are omitted for simplicity.)
  • Figure 3: We demonstrate smooth changes via rendering for different metallic and roughness strengths. The rendering is performed giving different combinations of the attributes. When the roughness value was set to 1, the cake and clock case shown in the top left are without specular highlights. When the metallic value was set to 1, the orange and baseball cases appeared to be metallic and revealing object illumination. Best viewed in color.
  • Figure 4: Albedo comparison. Albedo Comparison of Uni-Renderer with baseline methods. We compared 4 learning-based methods and 2 optimization-based methods. Among all, Uni-renderer yields the most realistic results. Best viewed in color.
  • Figure 5: Qualitative comparison. Rendering Comparison of Uni-Renderer with baseline methods. The base images are with a metallic value of 0.5. The comparison is made with a higher metallic value of 1.0. Best viewed in color.
  • ...and 6 more figures