Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
JunYong Choi, Min-Cheol Sagong, SeokYeong Lee, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho
TL;DR
The paper tackles the ill-posed problem of single-image inverse rendering by introducing two diffusion models, PDM for diverse plausible solutions and SDM for accurate predictions, guided by channel-wise noise scheduling that partitions generation across geometry, material, and lighting. A cascaded, low-resolution diffusion backbone jointly predicts per-pixel attributes and a neural ILR-based lighting representation, with an RGB-guided SRM to recover high-resolution outputs. Empirical results on synthetic OpenRooms FF and real-world spatially varying lighting datasets show that SDM excels in low-ambiguity scenes, while PDM provides valuable diversity in complex, ambiguous regions; together they improve downstream tasks like object insertion and material editing. The approach highlights the importance of modeling inter-modality dependencies and per-pixel lighting in inverse rendering, offering practical benefits for photorealistic scene editing and relighting applications, albeit with the need for future work to unify the two models and further leverage pretrained priors.
Abstract
We propose a diffusion-based inverse rendering framework that decomposes a single RGB image into geometry, material, and lighting. Inverse rendering is inherently ill-posed, making it difficult to predict a single accurate solution. To address this challenge, recent generative model-based methods aim to present a range of possible solutions. However, finding a single accurate solution and generating diverse solutions can be conflicting. In this paper, we propose a channel-wise noise scheduling approach that allows a single diffusion model architecture to achieve two conflicting objectives. The resulting two diffusion models, trained with different channel-wise noise schedules, can predict a single highly accurate solution and present multiple possible solutions. The experimental results demonstrate the superiority of our two models in terms of both diversity and accuracy, which translates to enhanced performance in downstream applications such as object insertion and material editing.
