Table of Contents
Fetching ...

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

Norman Müller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Matthias Nießner

TL;DR

DiffRF addresses the challenge of generating high-fidelity 3D radiance fields by directly applying denoising diffusion probabilistic models to explicit voxel-grid radiance fields. It introduces a rendering-guided diffusion objective that biases denoising toward image quality, enabling multi-view-consistent, unconditional 3D synthesis and novel conditional tasks such as masked radiance-field completion. The method integrates a 3D-UNet denoiser and a dual loss—radiance-field consistency and rendering accuracy—grounded by volumetric rendering, and demonstrates superior performance over state-of-the-art GAN-based 3D methods on chairs and tables datasets. The work expands diffusion-based 3D generation capabilities, offering a scalable path to coherent geometry and appearance in radiance-field representations with practical applications in 3D content creation. Future directions include faster sampling, higher resolution grids, and adaptive representations to further enhance efficiency and fidelity.

Abstract

We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

TL;DR

DiffRF addresses the challenge of generating high-fidelity 3D radiance fields by directly applying denoising diffusion probabilistic models to explicit voxel-grid radiance fields. It introduces a rendering-guided diffusion objective that biases denoising toward image quality, enabling multi-view-consistent, unconditional 3D synthesis and novel conditional tasks such as masked radiance-field completion. The method integrates a 3D-UNet denoiser and a dual loss—radiance-field consistency and rendering accuracy—grounded by volumetric rendering, and demonstrates superior performance over state-of-the-art GAN-based 3D methods on chairs and tables datasets. The work expands diffusion-based 3D generation capabilities, offering a scalable path to coherent geometry and appearance in radiance-field representations with practical applications in 3D content creation. Future directions include faster sampling, higher resolution grids, and adaptive representations to further enhance efficiency and fidelity.

Abstract

We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.
Paper Structure (11 sections, 14 equations, 7 figures, 3 tables)

This paper contains 11 sections, 14 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Our method performs denoising of a probabilistic diffusion process applied to 3D radiance fields. Guided by 3D supervision and volumetric rendering, our model enables unconditional synthesis of high-fidelity 3D assets (left). We further introduce the novel application of masked completion (right), i.e., the task of recovering shape and appearance from incomplete objects (highlighted in light-blue on the top right chair), solved by our model as conditional inference without task-specific training.
  • Figure 2: For a time step $t$ uniformly sampled from ${1,...,T}$, we first diffuse an initial radiance field $f_0$ according to a fixed noising schedule. The resulting $f_t$ is passed through a time-conditioned 3D-UNet, giving an estimate of the applied noise $\epsilon$. We guide the model by the noise prediction loss $L_\mathtt{RF}$ as well as a rendering loss $L_\mathtt{RGB}$ on the predicted denoising $\tilde{f}_0$.
  • Figure 3: Qualitative comparison between $\pi$-GAN chanmonteiro2020pi-GAN, EG3D Chan2022, and our method on PhotoShape Chairs photoshape2018. Our approach leads to diverse, geometrically accurate models that allow for high-quality renderings.
  • Figure 4: Qualitative comparison between $\pi$-GAN chanmonteiro2020pi-GAN, EG3D Chan2022, and our method on ABO Tables collins2022abo. Our approach generates high quality and diverse samples with accurate geometry.
  • Figure 5: Qualitative completion of masked chairs from PhotoShape photoshape2018. DiffRF shows more diverse proposals compared to EG3D, while also maintaining the original non-masked regions.
  • ...and 2 more figures