Table of Contents
Fetching ...

Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes

JunYong Choi, Min-Cheol Sagong, SeokYeong Lee, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho

TL;DR

The paper tackles the ill-posed problem of single-image inverse rendering by introducing two diffusion models, PDM for diverse plausible solutions and SDM for accurate predictions, guided by channel-wise noise scheduling that partitions generation across geometry, material, and lighting. A cascaded, low-resolution diffusion backbone jointly predicts per-pixel attributes and a neural ILR-based lighting representation, with an RGB-guided SRM to recover high-resolution outputs. Empirical results on synthetic OpenRooms FF and real-world spatially varying lighting datasets show that SDM excels in low-ambiguity scenes, while PDM provides valuable diversity in complex, ambiguous regions; together they improve downstream tasks like object insertion and material editing. The approach highlights the importance of modeling inter-modality dependencies and per-pixel lighting in inverse rendering, offering practical benefits for photorealistic scene editing and relighting applications, albeit with the need for future work to unify the two models and further leverage pretrained priors.

Abstract

We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generating diverse solutions can be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. The experimental results demonstrate the superiority of our two models in terms of both diversity and accuracy, which translates to enhanced performance in downstream applications such as object insertion and material editing.

Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes

TL;DR

The paper tackles the ill-posed problem of single-image inverse rendering by introducing two diffusion models, PDM for diverse plausible solutions and SDM for accurate predictions, guided by channel-wise noise scheduling that partitions generation across geometry, material, and lighting. A cascaded, low-resolution diffusion backbone jointly predicts per-pixel attributes and a neural ILR-based lighting representation, with an RGB-guided SRM to recover high-resolution outputs. Empirical results on synthetic OpenRooms FF and real-world spatially varying lighting datasets show that SDM excels in low-ambiguity scenes, while PDM provides valuable diversity in complex, ambiguous regions; together they improve downstream tasks like object insertion and material editing. The approach highlights the importance of modeling inter-modality dependencies and per-pixel lighting in inverse rendering, offering practical benefits for photorealistic scene editing and relighting applications, albeit with the need for future work to unify the two models and further leverage pretrained priors.

Abstract

We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generating diverse solutions can be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. The experimental results demonstrate the superiority of our two models in terms of both diversity and accuracy, which translates to enhanced performance in downstream applications such as object insertion and material editing.

Paper Structure

This paper contains 28 sections, 6 equations, 22 figures, 9 tables.

Figures (22)

  • Figure 1: Diffusion-Based Inverse Rendering. We present a diffusion-based inverse rendering framework that addresses the two competing goals of accuracy and diversity in inverse rendering by using two distinct models, each dedicated to achieving one of the objectives. (a) Our first model predicts a single accurate solution, while (b) the second model presents diverse possible solutions. (c) This enables practical applications such as object insertion and material editing (e.g., increasing the roughness of the brown table).
  • Figure 2: Entire pipeline. Our channel-wise noise scheduling assists the DM's inference by adjusting the transitions between modalities based on the timestep. Since the DM operates at a low resolution, modalities are up-sampled through our SRM.
  • Figure 3: Parameterizing noise scheduler. A larger $\tau$ value results in a slower generation of the modality.
  • Figure 4: Qualitative synthetic evaluation. Only our method successfully decomposes specular radiance into material and lighting.
  • Figure 5: Qualitative real-world evaluation. Our method successfully recognizes shadows under the desk, achieving plausible geometry prediction and realistic re-rendering results.
  • ...and 17 more figures