Table of Contents
Fetching ...

Bridging Physically Based Rendering and Diffusion Models with Stochastic Differential Equation

Junwei Shu, Wenjie Liu, Changgu Chen, Hantang Liu, Yang Li, Changbo Wang

TL;DR

This paper proposes a unified stochastic formulation that bridges Monte Carlo rendering and diffusion-based generative modeling and provides a systematic analysis of how the physical characteristics of path tracing can be extended to existing diffusion models from the perspective of noise variance.

Abstract

Diffusion-based image generators excel at producing realistic content from text or image conditions, but they offer only limited explicit control over low-level, physically grounded shading and material properties. In contrast, physically based rendering (PBR) offers fine-grained physical control but lacks prompt-driven flexibility. Although these two paradigms originate from distinct communities, both share a common evolution -- from noisy observations to clean images. In this paper, we propose a unified stochastic formulation that bridges Monte Carlo rendering and diffusion-based generative modeling. First, a general stochastic differential equation (SDE) formulation for Monte Carlo integration under the Central Limit Theorem is modeled. Through instantiation via physically based path tracing, we convert it into a physically grounded SDE representation. Moreover, we provide a systematic analysis of how the physical characteristics of path tracing can be extended to existing diffusion models from the perspective of noise variance. Extensive experiments across multiple tasks show that our method can exert physically grounded control over diffusion-generated results, covering tasks such as rendering and material editing.

Bridging Physically Based Rendering and Diffusion Models with Stochastic Differential Equation

TL;DR

This paper proposes a unified stochastic formulation that bridges Monte Carlo rendering and diffusion-based generative modeling and provides a systematic analysis of how the physical characteristics of path tracing can be extended to existing diffusion models from the perspective of noise variance.

Abstract

Diffusion-based image generators excel at producing realistic content from text or image conditions, but they offer only limited explicit control over low-level, physically grounded shading and material properties. In contrast, physically based rendering (PBR) offers fine-grained physical control but lacks prompt-driven flexibility. Although these two paradigms originate from distinct communities, both share a common evolution -- from noisy observations to clean images. In this paper, we propose a unified stochastic formulation that bridges Monte Carlo rendering and diffusion-based generative modeling. First, a general stochastic differential equation (SDE) formulation for Monte Carlo integration under the Central Limit Theorem is modeled. Through instantiation via physically based path tracing, we convert it into a physically grounded SDE representation. Moreover, we provide a systematic analysis of how the physical characteristics of path tracing can be extended to existing diffusion models from the perspective of noise variance. Extensive experiments across multiple tasks show that our method can exert physically grounded control over diffusion-generated results, covering tasks such as rendering and material editing.
Paper Structure (38 sections, 73 equations, 9 figures, 2 tables)

This paper contains 38 sections, 73 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Monte Carlo Sampling Based Rendering (first row) vs. Denoising Models (second row). These two methods share a similar formulation: involving a process with noise decreasing from high to low.
  • Figure 2: MC--SDE $\leftrightarrow$ Diffusion-SDE: variance-time alignment and unified noise.Top (Path Tracing). Monte Carlo estimation is cast as a continuous process $Y(\tau)$ indexed by the variance time$\tau$ (large $\tau$: few samples, high variance; $\tau\!\downarrow\!0$: noise-free), contracting toward the mean radiance $\mu$. Middle (Reverse SDE). The evolution matches the reverse dynamics of a clean-start VE process with noise scale $g(\tau)$; this provides a principled alignment between $\tau$ and diffusion-model time $t$ by matching variance/SNR. Bottom (Diffusion Model). Using the $\tau\!\leftrightarrow\!t$ mapping, a pretrained diffusion model is sampled along the matched schedule $g(t)$; we instantiate path tracing with a unified noise source (shared random draws for diffuse/specular), so high-variance specular content dominates early denoising and the process converges to a clean image. Summary. We map $\tau$ to $t$ via variance/SNR matching, with $\sigma(t)\!\propto\!\tau$ and $\mathrm{SNR}\!\propto\!1/\tau^{2}$, establishing the correspondence $g(\tau)\!\leftrightarrow\!g(t)$ and enabling physically interpretable schedules.
  • Figure 3: Effect of stage order on appearance and color statistics. Top-left inset: reference render $I_S$. We edit the reference image into to different materials, a specular one $\mathbf{x}_s$ (left) and a diffuse one $\widetilde{\mathbf{x}}_s$ (right). The bottom row plots the normalized color histogram $f_{CH}(\cdot)$ (RGB aggregated). Colors: blue $=$$f_{CH}(I_S)$, orange $=$$f_{CH}(\mathbf{x}_s)$, green $=$$f_{CH}(\widetilde{\mathbf{x}}_s)$. Allocating a larger denoising range to the high-variance specular stage (left) produces metallic highlights/shadows and a sharper, heavier-tailed histogram; reversing the order compresses that range (right), yielding a smoother, low-contrast histogram. The gray band marks intensity zones where our MC--SDE prior predicts higher $\rightarrow$ lower variance.
  • Figure 4: Diffusion Model takes over noisy path traced images. Top left is the path traced high spp images, as the ground truth. $N$ is the sample count of the path traced image in its column. The first row indicates the input into the diffusion model. For the second row, we initialize the baseline latent noise with the corresponding sample count image. Regardless of the specific sample counts per image, we then subject it to a complete denoising process. For the third row, we use the $t^*(\tau)$ mapper to inject the image with its sample count $N$ mapped to $t^*(\frac{1}{\sqrt{N}})$. And for the fourth row, we use an extra adapter to transform the noise distribution from Monte Carlo domain to diffusion domain, in the latent space. To summary up, our method of row 3 and 4 can make diffusion model understand the path traced noisy images. Directly mapping $\tau$ to $t$ brings better shape restore, while an extra adapter can make diffusion model generate better color in diffusion's noise space. All the guidance scale is set to 1.0 to test the generation result apart from prompt affects.
  • Figure 5: a simple framework for fine-grained material tuning. The first row presents growing degree of roughness $r$ from left to right. Second row presents growing degree of metallic $m$, still from left to right. We use a easy trick to map the PBR parameter into diffusion model's text space.
  • ...and 4 more figures