Table of Contents
Fetching ...

FA-BARF: Frequency Adapted Bundle-Adjusting Neural Radiance Fields

Rui Qian, Chenyangguang Zhang, Yan Di, Guangyao Zhai, Ruida Zhang, Jiayu Guo, Benjamin Busam, Jian Pu

TL;DR

FA-BARF introduces a frequency-adapted spatial low-pass filter to replace BARF's temporal frequency annealing, addressing frequency fluctuations that slow joint NeRF reconstruction and camera pose optimization. By leveraging Integrated Position Encoding (IPE) with cone-based sampling and covariance-aware encodings, FA-BARF stabilizes pose updates while maintaining or enhancing view synthesis quality. Theoretical analysis links NeRF frequency content to pose optimization and demonstrates how radial uncertainty overlaps across views improve convergence. Empirical results on synthetic and real-world scenes show FA-BARF accelerates training, improves pose accuracy, and yields better perceptual rendering, indicating strong potential for real-time dense 3D mapping and reconstruction under unknown poses.

Abstract

Neural Radiance Fields (NeRF) have exhibited highly effective performance for photorealistic novel view synthesis recently. However, the key limitation it meets is the reliance on a hand-crafted frequency annealing strategy to recover 3D scenes with imperfect camera poses. The strategy exploits a temporal low-pass filter to guarantee convergence while decelerating the joint optimization of implicit scene reconstruction and camera registration. In this work, we introduce the Frequency Adapted Bundle Adjusting Radiance Field (FA-BARF), substituting the temporal low-pass filter for a frequency-adapted spatial low-pass filter to address the decelerating problem. We establish a theoretical framework to interpret the relationship between position encoding of NeRF and camera registration and show that our frequency-adapted filter can mitigate frequency fluctuation caused by the temporal filter. Furthermore, we show that applying a spatial low-pass filter in NeRF can optimize camera poses productively through radial uncertainty overlaps among various views. Extensive experiments show that FA-BARF can accelerate the joint optimization process under little perturbations in object-centric scenes and recover real-world scenes with unknown camera poses. This implies wider possibilities for NeRF applied in dense 3D mapping and reconstruction under real-time requirements. The code will be released upon paper acceptance.

FA-BARF: Frequency Adapted Bundle-Adjusting Neural Radiance Fields

TL;DR

FA-BARF introduces a frequency-adapted spatial low-pass filter to replace BARF's temporal frequency annealing, addressing frequency fluctuations that slow joint NeRF reconstruction and camera pose optimization. By leveraging Integrated Position Encoding (IPE) with cone-based sampling and covariance-aware encodings, FA-BARF stabilizes pose updates while maintaining or enhancing view synthesis quality. Theoretical analysis links NeRF frequency content to pose optimization and demonstrates how radial uncertainty overlaps across views improve convergence. Empirical results on synthetic and real-world scenes show FA-BARF accelerates training, improves pose accuracy, and yields better perceptual rendering, indicating strong potential for real-time dense 3D mapping and reconstruction under unknown poses.

Abstract

Neural Radiance Fields (NeRF) have exhibited highly effective performance for photorealistic novel view synthesis recently. However, the key limitation it meets is the reliance on a hand-crafted frequency annealing strategy to recover 3D scenes with imperfect camera poses. The strategy exploits a temporal low-pass filter to guarantee convergence while decelerating the joint optimization of implicit scene reconstruction and camera registration. In this work, we introduce the Frequency Adapted Bundle Adjusting Radiance Field (FA-BARF), substituting the temporal low-pass filter for a frequency-adapted spatial low-pass filter to address the decelerating problem. We establish a theoretical framework to interpret the relationship between position encoding of NeRF and camera registration and show that our frequency-adapted filter can mitigate frequency fluctuation caused by the temporal filter. Furthermore, we show that applying a spatial low-pass filter in NeRF can optimize camera poses productively through radial uncertainty overlaps among various views. Extensive experiments show that FA-BARF can accelerate the joint optimization process under little perturbations in object-centric scenes and recover real-world scenes with unknown camera poses. This implies wider possibilities for NeRF applied in dense 3D mapping and reconstruction under real-time requirements. The code will be released upon paper acceptance.

Paper Structure

This paper contains 17 sections, 19 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparision of pose optimization process between FA-BARF and BARF. (a) FA-BARF utilizes a frequency adapted spatial low-pass filter to adjust the ability of optimizing poses among different frequencies. (b) BARF adopts a temporal low-pass filter to guide pose optimization from low frequency to high frequency. (c) The temporal low-pass filter causes frequency fluctuation, impeding the process of pose optimization with frequency switch during the training process.
  • Figure 2: Visual interpretation of radial certainty overlaps related to camera poses. As defined in barron2021mip, the covariance of sampled points with surrounding Gaussian region decreases when the distance between camera center and sampled point decreases with higher certainty to adjust the orientation of pose optimization. The shade of colour represents the degree of certainty. The deeper colour denotes higher certainty of the sampled point.
  • Figure 3: Visual accelerated reconstruction related to FA-BARF and BARF for the lego scene. (a) compares the PSNR index with visual demonstration of view synthesis among BARF without positional encoding mask, original BARF and FA-BARF as training time increases. (b) compares PSNR, SSIM and LPIPS among the three settings with increasing training time. FA-BARF achieves the best performance in reconstruction during the same time compared to original BARF, while BARF gets stuck in sub-optimal results without the positional encoding mask.
  • Figure 4: Visual accelerated registration related to FA-BARF and BARF for the lego scene. FA-BARF assures the convergence of camera poses faster than original BARF, while poses diverge to sub-optimal results in BARF without the positional encoding mask. The rotation errors are in degree and the translation errors are scaled by 100.
  • Figure 5: Qualitative results of FA-BARF and BARF on synthetic scenes. We visualize the expected depth through ray compositing (top) and the image synthesis (bottom). FA-BARF achieves the best synthesis view quality without PE mask, while original BARF results in suboptimal registration without PE mask, leading to synthesis artifacts.
  • ...and 1 more figures