ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli
TL;DR
ICON tackles the problem of learning Neural Radiance Fields from monocular video without pose initialization by introducing an incremental, confidence-guided optimization. It builds a Neural Confidence Field to dynamically reweight NeRF and pose gradients and couples incremental frame registrations with restart strategies and a Sampson-distance geometric constraint. The approach achieves state-of-the-art or competitive results on CO3D and HO3D, often outperforming SfM-based pose pipelines and matching RGB-D methods in dynamic object scenarios. This work advances camera-pose-free NeRF training and object-centric 3D reconstruction from RGB video, with potential for broader video inputs and reduced reliance on depth sensors.
Abstract
Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at removing the requirement for pose initialization. We present Incremental CONfidence (ICON), an optimization procedure for training NeRFs from 2D video frames. ICON only assumes smooth camera motion to estimate initial guess for poses. Further, ICON introduces ``confidence": an adaptive measure of model quality used to dynamically reweight gradients. ICON relies on high-confidence poses to learn NeRF, and high-confidence 3D structure (as encoded by NeRF) to learn poses. We show that ICON, without prior pose initialization, achieves superior performance in both CO3D and HO3D versus methods which use SfM pose.
