Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
Xunzhi Zheng, Dan Xu
TL;DR
Flow-NeRF tackles pose-free NeRF by jointly learning camera poses, scene geometry, and dense optical flow within a unified neural representation. It introduces a two-branch architecture with shared point sampling, a pose-conditioned bijective mapping for dense novel-view flow via Real-NVP, and a feature message-passing path that distills flow information into geometry. The learning objective combines photometric, depth, point-cloud, and flow losses to produce accurate novel-view synthesis, depth estimation, pose prediction, and long-range novel-view flow. Experiments on Tanks & Temples, ScanNet, and Sintel demonstrate substantial improvements in NVS and depth and competitive long-range flow, enabling holistic scene modeling and meaningful correspondences across novel views.
Abstract
Learning accurate scene reconstruction without pose priors in neural radiance fields is challenging due to inherent geometric ambiguity. Recent development either relies on correspondence priors for regularization or uses off-the-shelf flow estimators to derive analytical poses. However, the potential for jointly learning scene geometry, camera poses, and dense flow within a unified neural representation remains largely unexplored. In this paper, we present Flow-NeRF, a unified framework that simultaneously optimizes scene geometry, camera poses, and dense optical flow all on-the-fly. To enable the learning of dense flow within the neural radiance field, we design and build a bijective mapping for flow estimation, conditioned on pose. To make the scene reconstruction benefit from the flow estimation, we develop an effective feature enhancement mechanism to pass canonical space features to world space representations, significantly enhancing scene geometry. We validate our model across four important tasks, i.e., novel view synthesis, depth estimation, camera pose prediction, and dense optical flow estimation, using several datasets. Our approach surpasses previous methods in almost all metrics for novel-view view synthesis and depth estimation and yields both qualitatively sound and quantitatively accurate novel-view flow. Our project page is https://zhengxunzhi.github.io/flownerf/.
