Table of Contents
Fetching ...

Bilateral Guided Radiance Field Processing

Yuehao Wang, Chaoyi Wang, Bingchen Gong, Tianfan Xue

TL;DR

This paper tackles the problem of multi-view inconsistencies caused by view-dependent image signal processing in neural radiance fields. It introduces a differentiable 3D bilateral grid to model per-view ISP effects during NeRF training and a low-rank 4D bilateral grid to lift 2D user edits to 3D radiance fields at finishing time, ensuring view-consistent enhancements. The approach jointly disentangles camera pipeline processing from NeRF optimization and provides a practical 3D editing workflow that supports HDR fusion, recoloring, and lighting disentanglement. Experiments on challenging nighttime and photometrically varied scenes demonstrate reduced floaters and improved rendering quality, with ablations validating the importance of TV regularization, low-rank constraints, and careful initialization. Overall, bilateral guided training and finishing offer a scalable and user-friendly framework for robust NeRF reconstruction and 3D-level image retouching.

Abstract

Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone mapping, etc. While these processings greatly improve image quality, they often break the multi-view consistency assumption, leading to "floaters" in the reconstructed radiance fields. To address this concern without compromising visual aesthetics, we aim to first disentangle the enhancement by ISP at the NeRF training stage and re-apply user-desired enhancements to the reconstructed radiance fields at the finishing stage. Furthermore, to make the re-applied enhancements consistent between novel views, we need to perform imaging signal processing in 3D space (i.e. "3D ISP"). For this goal, we adopt the bilateral grid, a locally-affine model, as a generalized representation of ISP processing. Specifically, we optimize per-view 3D bilateral grids with radiance fields to approximate the effects of camera pipelines for each input view. To achieve user-adjustable 3D finishing, we propose to learn a low-rank 4D bilateral grid from a given single view edit, lifting photo enhancements to the whole 3D scene. We demonstrate our approach can boost the visual quality of novel view synthesis by effectively removing floaters and performing enhancements from user retouching. The source code and our data are available at: https://bilarfpro.github.io.

Bilateral Guided Radiance Field Processing

TL;DR

This paper tackles the problem of multi-view inconsistencies caused by view-dependent image signal processing in neural radiance fields. It introduces a differentiable 3D bilateral grid to model per-view ISP effects during NeRF training and a low-rank 4D bilateral grid to lift 2D user edits to 3D radiance fields at finishing time, ensuring view-consistent enhancements. The approach jointly disentangles camera pipeline processing from NeRF optimization and provides a practical 3D editing workflow that supports HDR fusion, recoloring, and lighting disentanglement. Experiments on challenging nighttime and photometrically varied scenes demonstrate reduced floaters and improved rendering quality, with ablations validating the importance of TV regularization, low-rank constraints, and careful initialization. Overall, bilateral guided training and finishing offer a scalable and user-friendly framework for robust NeRF reconstruction and 3D-level image retouching.

Abstract

Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone mapping, etc. While these processings greatly improve image quality, they often break the multi-view consistency assumption, leading to "floaters" in the reconstructed radiance fields. To address this concern without compromising visual aesthetics, we aim to first disentangle the enhancement by ISP at the NeRF training stage and re-apply user-desired enhancements to the reconstructed radiance fields at the finishing stage. Furthermore, to make the re-applied enhancements consistent between novel views, we need to perform imaging signal processing in 3D space (i.e. "3D ISP"). For this goal, we adopt the bilateral grid, a locally-affine model, as a generalized representation of ISP processing. Specifically, we optimize per-view 3D bilateral grids with radiance fields to approximate the effects of camera pipelines for each input view. To achieve user-adjustable 3D finishing, we propose to learn a low-rank 4D bilateral grid from a given single view edit, lifting photo enhancements to the whole 3D scene. We demonstrate our approach can boost the visual quality of novel view synthesis by effectively removing floaters and performing enhancements from user retouching. The source code and our data are available at: https://bilarfpro.github.io.
Paper Structure (37 sections, 9 equations, 25 figures, 2 tables)

This paper contains 37 sections, 9 equations, 25 figures, 2 tables.

Figures (25)

  • Figure 1: Inconsistent camera processing of multi-view images leads to artifacts in novel view synthesis.
  • Figure 2: Pipeline of our proposed method. Our approach consists of two stages: 1) In the training stage, we use the 3D bilateral grid to approximate view-dependent camera enhancements on the rendering results; 2) In the finishing stage, we slice the low-rank 4D bilateral grid to apply 3D-level enhancements.
  • Figure 3: Illustration of how 3D bilateral grids guide NeRF training. Directly rendered images (a) from the radiance field do not incorporate per-view camera enhancement and do not match the input images (rendering target). With a per-view bilateral grid applied, the rendered images (b) can reproduce camera enhancement and are almost identical to input images (c).
  • Figure 4: Results of our bilateral guided NeRF finishing. During 3D-level finishing, users are asked to select a view (a) and retouch it with image tools (b). Then, our method trains a 4D bilateral grid to close the gap between the NeRF rendering and the user editing. Upon optimizing the 4D bilateral grid, the user-adjusted retouching is not only mapped to the edited areas (c), but is also transferred to other areas (d) that are unseen in the edited view.
  • Figure 5: Comparison of our method and ZipNeRF baseline barron2023zipnerf on an indoor scene with relatively minor photometric variation across the input views. Even though, the minor variation results in floaters in the ZipNeRF results. Our bilateral guided training can effectively disentangle the variation and overcome this issue.
  • ...and 20 more figures