Bilateral Guided Radiance Field Processing
Yuehao Wang, Chaoyi Wang, Bingchen Gong, Tianfan Xue
TL;DR
This paper tackles the problem of multi-view inconsistencies caused by view-dependent image signal processing in neural radiance fields. It introduces a differentiable 3D bilateral grid to model per-view ISP effects during NeRF training and a low-rank 4D bilateral grid to lift 2D user edits to 3D radiance fields at finishing time, ensuring view-consistent enhancements. The approach jointly disentangles camera pipeline processing from NeRF optimization and provides a practical 3D editing workflow that supports HDR fusion, recoloring, and lighting disentanglement. Experiments on challenging nighttime and photometrically varied scenes demonstrate reduced floaters and improved rendering quality, with ablations validating the importance of TV regularization, low-rank constraints, and careful initialization. Overall, bilateral guided training and finishing offer a scalable and user-friendly framework for robust NeRF reconstruction and 3D-level image retouching.
Abstract
Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone mapping, etc. While these processings greatly improve image quality, they often break the multi-view consistency assumption, leading to "floaters" in the reconstructed radiance fields. To address this concern without compromising visual aesthetics, we aim to first disentangle the enhancement by ISP at the NeRF training stage and re-apply user-desired enhancements to the reconstructed radiance fields at the finishing stage. Furthermore, to make the re-applied enhancements consistent between novel views, we need to perform imaging signal processing in 3D space (i.e. "3D ISP"). For this goal, we adopt the bilateral grid, a locally-affine model, as a generalized representation of ISP processing. Specifically, we optimize per-view 3D bilateral grids with radiance fields to approximate the effects of camera pipelines for each input view. To achieve user-adjustable 3D finishing, we propose to learn a low-rank 4D bilateral grid from a given single view edit, lifting photo enhancements to the whole 3D scene. We demonstrate our approach can boost the visual quality of novel view synthesis by effectively removing floaters and performing enhancements from user retouching. The source code and our data are available at: https://bilarfpro.github.io.
