KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
Yifan Zhan, Zhuoxiao Li, Muyao Niu, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng
TL;DR
KFD-NeRF tackles dynamic view synthesis by modeling the 4D radiance field as a dynamic system and fusing predictions from a locally linear motion model with direct observations via a plug-in Kalman filter, yielding accurate framewise deformations $d\bm{x}_{t_i}$ from $y_{t_i}$ and $\hat{d\bm{x}}_{t_i}^-$. The method uses a two-branch deformation field (observer and predictor) and encodes the canonical space with an efficient tri-plane representation, augmented by regularization losses on the canonical space and a temporal-information schedule during training. Key contributions include the first integration of a neural Kalman filter into deformation-based dynamic NeRFs, a two-branch motion estimation design, and a regularized, fast-converging canonical-space embedding that enables a shallow observation MLP. Empirically, KFD-NeRF achieves state-of-the-art or competitive view synthesis on both synthetic and real dynamic scenes with comparable training efficiency, highlighting the practical value of incorporating temporal priors and Kalman-based fusion for 4D radiance fields.
Abstract
We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering. Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions. We introduce a novel plug-in Kalman filter guided deformation field that enables accurate deformation estimation from scene observations and predictions. We use a shallow Multi-Layer Perceptron (MLP) for observations and model the motion as locally linear to calculate predictions with motion equations. To further enhance the performance of the observation MLP, we introduce regularization in the canonical space to facilitate the network's ability to learn warping for different frames. Additionally, we employ an efficient tri-plane representation for encoding the canonical space, which has been experimentally demonstrated to converge quickly with high quality. This enables us to use a shallower observation MLP, consisting of just two layers in our implementation. We conduct experiments on synthetic and real data and compare with past dynamic NeRF methods. Our KFD-NeRF demonstrates similar or even superior rendering performance within comparable computational time and achieves state-of-the-art view synthesis performance with thorough training.
