Table of Contents
Fetching ...

fNeRF: High Quality Radiance Fields from Practical Cameras

Yi Hua, Christoph Lassner, Carsten Stoll, Iain Matthews

TL;DR

This paper addresses the mismatch between neural radiance fields and real-world camera optics by introducing a finite-aperture rendering model, enabling defocus-aware image formation for radiance-field reconstruction. The core method, ƒNeRF, casts multiple rays from the camera aperture toward a focus plane, integrates over the aperture, and provides an analytic gradient for the aperture radius to jointly optimize aperture and focus depth. Empirically, it yields sharper reconstructions and up to about 3 dB improvements in PSNR on all-in-focus views across synthetic and real datasets, outperforming pinhole-based NeRFs and aperture-augmented baselines, while remaining computationally tractable. This approach broadens the practical applicability of radiance-field methods to real cameras, with potential extensions to more expressive lens models and aberration effects.

Abstract

In recent years, the development of Neural Radiance Fields has enabled a previously unseen level of photo-realistic 3D reconstruction of scenes and objects from multi-view camera data. However, previous methods use an oversimplified pinhole camera model resulting in defocus blur being `baked' into the reconstructed radiance field. We propose a modification to the ray casting that leverages the optics of lenses to enhance scene reconstruction in the presence of defocus blur. This allows us to improve the quality of radiance field reconstructions from the measurements of a practical camera with finite aperture. We show that the proposed model matches the defocus blur behavior of practical cameras more closely than pinhole models and other approximations of defocus blur models, particularly in the presence of partial occlusions. This allows us to achieve sharper reconstructions, improving the PSNR on validation of all-in-focus images, on both synthetic and real datasets, by up to 3 dB.

fNeRF: High Quality Radiance Fields from Practical Cameras

TL;DR

This paper addresses the mismatch between neural radiance fields and real-world camera optics by introducing a finite-aperture rendering model, enabling defocus-aware image formation for radiance-field reconstruction. The core method, ƒNeRF, casts multiple rays from the camera aperture toward a focus plane, integrates over the aperture, and provides an analytic gradient for the aperture radius to jointly optimize aperture and focus depth. Empirically, it yields sharper reconstructions and up to about 3 dB improvements in PSNR on all-in-focus views across synthetic and real datasets, outperforming pinhole-based NeRFs and aperture-augmented baselines, while remaining computationally tractable. This approach broadens the practical applicability of radiance-field methods to real cameras, with potential extensions to more expressive lens models and aberration effects.

Abstract

In recent years, the development of Neural Radiance Fields has enabled a previously unseen level of photo-realistic 3D reconstruction of scenes and objects from multi-view camera data. However, previous methods use an oversimplified pinhole camera model resulting in defocus blur being `baked' into the reconstructed radiance field. We propose a modification to the ray casting that leverages the optics of lenses to enhance scene reconstruction in the presence of defocus blur. This allows us to improve the quality of radiance field reconstructions from the measurements of a practical camera with finite aperture. We show that the proposed model matches the defocus blur behavior of practical cameras more closely than pinhole models and other approximations of defocus blur models, particularly in the presence of partial occlusions. This allows us to achieve sharper reconstructions, improving the PSNR on validation of all-in-focus images, on both synthetic and real datasets, by up to 3 dB.
Paper Structure (22 sections, 8 equations, 7 figures, 3 tables)

This paper contains 22 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Synthetic defocus on the MipNeRF360 'bicycle' scene barron2022mipnerf360. Left rendered with ZipNeRF barron2023zipnerf sampling cones modified to account for a larger aperture; right the proposed ƒNeRF. The modified ZipNerf forward model doesn't accurately model partial occlusions within a pixel, while ƒNeRF does and produces realistic blur and bokeh.
  • Figure 2: Top: The red point on the focus plane results in a sharp image, while the blue point in front of the focus plane results in a blurry image, or bokeh. Bottom: to render color at pixel $\mathbf{p}$, we draw samples $\mathbf{a}$ from the aperture and cast modified rays $(\mathbf{o}(\mathbf{a}), \mathbf{d}(\mathbf{a}))$ from it.
  • Figure 3: Schematic of sampling locations of different methods in 2D. (a) blue points show ZipNeRF samples on a small cone with apex on the pinhole location; (b) yellow points show ZipNeRF modified for aperture by moving the apex to focus plane and expanding the cone to match aperture; (c) purple points show ƒNeRF samples on 6 rays, drawn at random, passing through the aperture. ZipNeRF sample solely on a cone surface whereas the proposed method casts rays within the cone volume. We show actual sampling locations in 3D in the supplementary video.
  • Figure 4: Synthetic data results. From left to right: input frame closest to test viewpoint, reconstruction from iNGP, ZipNeRF, ZipNeRF modified for aperture, the proposed method LensNeRF with 6 rays per pixel, LensNeRF with 32 rays per pixel, and all-in-focus ground truth. The closest input views demonstrate notably blurry regions in important areas of the image, and our reconstructed model is not able to leverage this viewpoint for the reconstruction of this viewing angle. However, other viewpoints were sufficient to reconstruct the areas in question with high fidelity and sharpness and the model is not negatively affected by the reconstruction loss thanks to our accurate depth-of-field model.
  • Figure 5: Reconstruction quality and runtime v.s. number of rays per pixel on synthetic lego data. The reconstruction quality saturates near 16 rays per pixel; the runtime does not decrease significantly below 8 rays per pixel.
  • ...and 2 more figures