Table of Contents
Fetching ...

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

Jie Long Lee, Chen Li, Gim Hee Lee

TL;DR

DiSR-NeRF addresses NeRF super-resolution from LR multi-view data by distilling 2D diffusion priors into a 3D NeRF. It introduces Iterative 3D Synchronization (I3DS) to alternate between diffusion-based upscaling of LR renders and NeRF training to enforce view-consistency, and Renoised Score Distillation (RSD) to blend ancestral sampling and SDS for sharp LR-consistent details. It achieves state-of-the-art qualitative and quantitative results on NeRF-Synthetic and LLFF without requiring high-resolution references. The approach lowers data requirements while enabling practical SR NeRF for low-resolution capture devices, improving cross-view detail fidelity.

Abstract

We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate the inconsistency problem via the inherent multi-view consistency property of NeRF. Specifically, our I3DS alternates between upscaling low-resolution (LR) rendered images with diffusion models, and updating the underlying 3D representation with standard NeRF training. We further introduce Renoised Score Distillation (RSD), a novel score-distillation objective for 2D image resolution. Our RSD combines features from ancestral sampling and Score Distillation Sampling (SDS) to generate sharp images that are also LR-consistent. Qualitative and quantitative results on both synthetic and real-world datasets demonstrate that our DiSR-NeRF can achieve better results on NeRF super-resolution compared with existing works. Code and video results available at the project website.

DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF

TL;DR

DiSR-NeRF addresses NeRF super-resolution from LR multi-view data by distilling 2D diffusion priors into a 3D NeRF. It introduces Iterative 3D Synchronization (I3DS) to alternate between diffusion-based upscaling of LR renders and NeRF training to enforce view-consistency, and Renoised Score Distillation (RSD) to blend ancestral sampling and SDS for sharp LR-consistent details. It achieves state-of-the-art qualitative and quantitative results on NeRF-Synthetic and LLFF without requiring high-resolution references. The approach lowers data requirements while enabling practical SR NeRF for low-resolution capture devices, improving cross-view detail fidelity.

Abstract

We present DiSR-NeRF, a diffusion-guided framework for view-consistent super-resolution (SR) NeRF. Unlike prior works, we circumvent the requirement for high-resolution (HR) reference images by leveraging existing powerful 2D super-resolution models. Nonetheless, independent SR 2D images are often inconsistent across different views. We thus propose Iterative 3D Synchronization (I3DS) to mitigate the inconsistency problem via the inherent multi-view consistency property of NeRF. Specifically, our I3DS alternates between upscaling low-resolution (LR) rendered images with diffusion models, and updating the underlying 3D representation with standard NeRF training. We further introduce Renoised Score Distillation (RSD), a novel score-distillation objective for 2D image resolution. Our RSD combines features from ancestral sampling and Score Distillation Sampling (SDS) to generate sharp images that are also LR-consistent. Qualitative and quantitative results on both synthetic and real-world datasets demonstrate that our DiSR-NeRF can achieve better results on NeRF super-resolution compared with existing works. Code and video results available at the project website.
Paper Structure (33 sections, 16 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 33 sections, 16 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Our DiSR-NeRF distils super resolution priors from a 2D diffusion upscaler to generate high quality details from low resolution NeRFs.
  • Figure 2: I3DS seperates upscaling and NeRF fitting in seperate, alternate stages. NeRF renders are upscaled via RSD in the upscaling stage, and upscaled images are used as training images to learn view-consistent details. The two stage process is repeated over several cycles to achieve detail convergence.
  • Figure 3: Our RSD produces LR-consistent HR details by optimizing $\mathbf{z}'_{t-1}$ towards predicted denoised latents $\hat{\mathbf{z}}_{t-1}$ following a linearly decreasing time schedule. After optimization, the residuals $\mathbf{h}_\theta$ contain HR details that is added to $\mathbf{z}_0$ to obtain upscaled latents $\mathbf{z}'_0$, which is decoded into LR-consistent upscaled images $\mathbf{x}'_0$. Refer to the text in Sec. \ref{['subsection:RSD']} for more details.
  • Figure 4: Qualitative Results on NeRF-Synthetic Dataset.
  • Figure 5: Qualitative Results on LLFF Dataset.
  • ...and 4 more figures