Table of Contents
Fetching ...

FBINeRF: Feature-Based Integrated Recurrent Network for Pinhole and Fisheye Neural Radiance Fields

Yifan Wu, Tianyi Cheng, Peixu Xin, Janusz Konrad

TL;DR

FBINeRF tackles the challenge of robust neural radiance field optimization under mixed pinhole and fisheye distortions by introducing a feature-based recurrent framework with adaptive GRUs and a flexible bundle adjustment mechanism. It combines DenseNet-derived feature matrices, attention-based context integration, and IBRNet-based rendering to jointly refine camera poses and depth priors, with explicit distortion modeling for fisheye lenses. The approach yields high-fidelity novel-view synthesis and enables mesh generation for downstream applications, demonstrating superior performance over prior methods on both pinhole and fisheye datasets and improving convergence speed. This work advances practical NeRF deployment in distorted imaging scenarios and opens pathways for efficient, distortion-aware 3D reconstruction and visualization in real-world pipelines.

Abstract

Previous studies aiming to optimize and bundle-adjust camera poses using Neural Radiance Fields (NeRFs), such as BARF and DBARF, have demonstrated impressive capabilities in 3D scene reconstruction. However, these approaches have been designed for pinhole-camera pose optimization and do not perform well under radial image distortions such as those in fisheye cameras. Furthermore, inaccurate depth initialization in DBARF results in erroneous geometric information affecting the overall convergence and quality of results. In this paper, we propose adaptive GRUs with a flexible bundle-adjustment method adapted to radial distortions and incorporate feature-based recurrent neural networks to generate continuous novel views from fisheye datasets. Other NeRF methods for fisheye images, such as SCNeRF and OMNI-NeRF, use projected ray distance loss for distorted pose refinement, causing severe artifacts, long rendering time, and are difficult to use in downstream tasks, where the dense voxel representation generated by a NeRF method needs to be converted into a mesh representation. We also address depth initialization issues by adding MiDaS-based depth priors for pinhole images. Through extensive experiments, we demonstrate the generalization capacity of FBINeRF and show high-fidelity results for both pinhole-camera and fisheye-camera NeRFs.

FBINeRF: Feature-Based Integrated Recurrent Network for Pinhole and Fisheye Neural Radiance Fields

TL;DR

FBINeRF tackles the challenge of robust neural radiance field optimization under mixed pinhole and fisheye distortions by introducing a feature-based recurrent framework with adaptive GRUs and a flexible bundle adjustment mechanism. It combines DenseNet-derived feature matrices, attention-based context integration, and IBRNet-based rendering to jointly refine camera poses and depth priors, with explicit distortion modeling for fisheye lenses. The approach yields high-fidelity novel-view synthesis and enables mesh generation for downstream applications, demonstrating superior performance over prior methods on both pinhole and fisheye datasets and improving convergence speed. This work advances practical NeRF deployment in distorted imaging scenarios and opens pathways for efficient, distortion-aware 3D reconstruction and visualization in real-world pipelines.

Abstract

Previous studies aiming to optimize and bundle-adjust camera poses using Neural Radiance Fields (NeRFs), such as BARF and DBARF, have demonstrated impressive capabilities in 3D scene reconstruction. However, these approaches have been designed for pinhole-camera pose optimization and do not perform well under radial image distortions such as those in fisheye cameras. Furthermore, inaccurate depth initialization in DBARF results in erroneous geometric information affecting the overall convergence and quality of results. In this paper, we propose adaptive GRUs with a flexible bundle-adjustment method adapted to radial distortions and incorporate feature-based recurrent neural networks to generate continuous novel views from fisheye datasets. Other NeRF methods for fisheye images, such as SCNeRF and OMNI-NeRF, use projected ray distance loss for distorted pose refinement, causing severe artifacts, long rendering time, and are difficult to use in downstream tasks, where the dense voxel representation generated by a NeRF method needs to be converted into a mesh representation. We also address depth initialization issues by adding MiDaS-based depth priors for pinhole images. Through extensive experiments, we demonstrate the generalization capacity of FBINeRF and show high-fidelity results for both pinhole-camera and fisheye-camera NeRFs.
Paper Structure (31 sections, 16 equations, 6 figures, 3 tables)

This paper contains 31 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: FBINeRF and DBARF reconstructions for natural and synthetic fisheye data. Two examples from FisheyeNeRF dataset jeong2021selfcalibrating demonstrate that even a very small radial distortions degrade performance of DBARF but not of FBINeRF. Top: Chairs image - DBARF reconstruction (no lens-distortion model) shows severe distortions on some chairs and window blinds. Middle: Rock image - obvious distortions especially at periphery. Bottom: Synthetic image eichenseer2022data - cannot be reconstructed by DBARF.
  • Figure 2: Network architecture of FBINeRF. 1) The input can be either pinhole or fisheye views. Neighbor views are selected by keypoint matching: FeatureBooster wang2023featurebooster in pinhole views and SphereGlue inproceedings in fisheye views. 2) Image features are extracted by DenseNet-like huang2018densely backbone and sent to subsequent blocks. 3) Depth priors are utilized for depth map initialization in pinhole model and kept updating; Metric-bins module enhances predicted depth priors through fine-tuning with supervised learning, please refer to bhat2023zoedepth. 4) Pose/depth, image features, contextual features, cost map, and context vectors are sent to AdamR-GRU PAL2023110457 optimizer to update poses and depth values. For fisheye views, depth update is replaced by flexible bundle adjustment with a lens-distorsion model to update camera poses. 5) IBRNet is used as the baseline to predict color and density values, jointly training with both pose-optimization blocks.
  • Figure 3: Qualitative comparison of depth maps predicted from LLFF datasetmildenhall2019local. FBINeRF depth maps (produced by depth priors and adaptive GRU optimizer) are more accurate than from DBARF (generated by depth header) even after finetune.
  • Figure 4: Qualitative comparison of novel views generated from a synthetic fisheye NeRF dataseteichenseer2022data. Both SCNeRF and OMNI-NeRF produce clear distorions in the clouds, tree and car wheel (top) and on the mirror and drawer (bottom) whereas FBINeRF does not.
  • Figure 5: Qualitative results on fisheye NeRF datasetjeong2021selfcalibrating for SCNeRF and FBINeRF. We show qualitative results for our method and SCNeRF on the fisheye NeRF dataset jeong2021selfcalibrating. As we can see, our method produces sharper, less noisy, and more accurate rendered views than SCNeRF. On average, SCNeRF requires over 6 hours to train for each scene in this dataset to obtain results shown in Fig. \ref{['fig 5']} while ours generates continuous dense voxel fisheye novel views within half an hour.
  • ...and 1 more figures