Table of Contents
Fetching ...

Learning Neural Exposure Fields for View Synthesis

Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Christina Tsalicoglou, Keisuke Tateno, Jonathan T. Barron, Federico Tombari

TL;DR

This paper introduces Neural Exposure Fields (NExF), a novel technique for robustly reconstructing 3D scenes with high quality and 3D-consistent appearance from challenging real-world captures and produces state-of-the-art results on several benchmarks improving by over 55% over best-performing baselines.

Abstract

Recent advances in neural scene representations have led to unprecedented quality in 3D reconstruction and view synthesis. Despite achieving high-quality results for common benchmarks with curated data, outputs often degrade for data that contain per image variations such as strong exposure changes, present, e.g., in most scenes with indoor and outdoor areas or rooms with windows. In this paper, we introduce Neural Exposure Fields (NExF), a novel technique for robustly reconstructing 3D scenes with high quality and 3D-consistent appearance from challenging real-world captures. In the core, we propose to learn a neural field predicting an optimal exposure value per 3D point, enabling us to optimize exposure along with the neural scene representation. While capture devices such as cameras select optimal exposure per image/pixel, we generalize this concept and perform optimization in 3D instead. This enables accurate view synthesis in high dynamic range scenarios, bypassing the need of post-processing steps or multi-exposure captures. Our contributions include a novel neural representation for exposure prediction, a system for joint optimization of the scene representation and the exposure field via a novel neural conditioning mechanism, and demonstrated superior performance on challenging real-world data. We find that our approach trains faster than prior works and produces state-of-the-art results on several benchmarks improving by over 55% over best-performing baselines.

Learning Neural Exposure Fields for View Synthesis

TL;DR

This paper introduces Neural Exposure Fields (NExF), a novel technique for robustly reconstructing 3D scenes with high quality and 3D-consistent appearance from challenging real-world captures and produces state-of-the-art results on several benchmarks improving by over 55% over best-performing baselines.

Abstract

Recent advances in neural scene representations have led to unprecedented quality in 3D reconstruction and view synthesis. Despite achieving high-quality results for common benchmarks with curated data, outputs often degrade for data that contain per image variations such as strong exposure changes, present, e.g., in most scenes with indoor and outdoor areas or rooms with windows. In this paper, we introduce Neural Exposure Fields (NExF), a novel technique for robustly reconstructing 3D scenes with high quality and 3D-consistent appearance from challenging real-world captures. In the core, we propose to learn a neural field predicting an optimal exposure value per 3D point, enabling us to optimize exposure along with the neural scene representation. While capture devices such as cameras select optimal exposure per image/pixel, we generalize this concept and perform optimization in 3D instead. This enables accurate view synthesis in high dynamic range scenarios, bypassing the need of post-processing steps or multi-exposure captures. Our contributions include a novel neural representation for exposure prediction, a system for joint optimization of the scene representation and the exposure field via a novel neural conditioning mechanism, and demonstrated superior performance on challenging real-world data. We find that our approach trains faster than prior works and produces state-of-the-art results on several benchmarks improving by over 55% over best-performing baselines.

Paper Structure

This paper contains 10 sections, 10 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Neural Exposure Fields. While state-of-the-art neural fields Barron2023ZipNeRF produce high-quality results on clean, well-curated datasets, the quality drops significantly for real-world captures if the exposure variation is ignored (\ref{['subfig:teasera']}). When equipped with per-view GLO embeddings MartinBrualla2021NeRFWildBarron2023ZipNeRF, the results improve (\ref{['subfig:teaserb']}) but scene parts might be over- or underexposed. In contrast, our neural exposure field leads to high-quality 3D consistent scene appearance (\ref{['subfig:teaserc']}) while no manual post-processing or reference view is required.
  • Figure 3: Method Overview. Our method takes as input a set of RGB images $\{I_i\}_i$ with exposure times $\{\Delta t_i\}_i$ and outputs a neural representation that produces high-quality, well-exposed appearance in a 3D consistent manner from arbitrary viewpoints. More specifically, during training, points are sampled along the ray and for each point $\mathbf{x}$, viewing direction $\mathbf{d}$, and input exposure $\Delta t$, the neural field $f_\theta$ predicts a density $\sigma$ and a color $\mathbf{c}$. The final color prediction $\mathbf{c}_\text{pixel}$ is obtained via volume rendering and the model is trained with the MSE loss on the input views with varying exposure. Similarly, the neural exposure field is trained by volume rendering the 3D predictions $\Delta \hat{t}$ to the image plane and backpropagating the reconstruction loss wrt. the input exposure weighted by how well the pixel is exposed and saturated. At test time, the neural field $f_\theta$ is instead conditioned on the neural exposure field predictions $\Delta \hat{t}$, producing high-quality, well-exposed novel views that are consistent in 3D where no 2D tonemapping nor target appearance produced by a professional is required.
  • Figure 4: Exposure Visualization. Instead of a single exposure per image as commonly done, we optimize a 3D neural exposure field predicting optimal exposure in 3D leading to well-exposed colors for all parts of the scene (see e.g. short exposure (dark) for outdoor and longer exposure (white) for darker indoor parts above).
  • Figure 6: Exposure Fusion. During training, each input image is observed with only a single randomly-sampled exposure (\ref{['subfig:varying-exposure']}). For evaluation, we apply exposure fusion (\ref{['subfig:hdr-clipped']}) to obtain higher-quality and well-exposed target images (\ref{['subfig:exposure-fusion']}) compared to the default single exposure images (\ref{['subfig:single-exposure']}).
  • Figure 7: Qualitative Results on Eyeful Tower. Results from our model and baselines for the office_view2 and riverview scenes.
  • ...and 3 more figures