Table of Contents
Fetching ...

KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter

Yifan Zhan, Zhuoxiao Li, Muyao Niu, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng

TL;DR

KFD-NeRF tackles dynamic view synthesis by modeling the 4D radiance field as a dynamic system and fusing predictions from a locally linear motion model with direct observations via a plug-in Kalman filter, yielding accurate framewise deformations $d\bm{x}_{t_i}$ from $y_{t_i}$ and $\hat{d\bm{x}}_{t_i}^-$. The method uses a two-branch deformation field (observer and predictor) and encodes the canonical space with an efficient tri-plane representation, augmented by regularization losses on the canonical space and a temporal-information schedule during training. Key contributions include the first integration of a neural Kalman filter into deformation-based dynamic NeRFs, a two-branch motion estimation design, and a regularized, fast-converging canonical-space embedding that enables a shallow observation MLP. Empirically, KFD-NeRF achieves state-of-the-art or competitive view synthesis on both synthetic and real dynamic scenes with comparable training efficiency, highlighting the practical value of incorporating temporal priors and Kalman-based fusion for 4D radiance fields.

Abstract

We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering. Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions. We introduce a novel plug-in Kalman filter guided deformation field that enables accurate deformation estimation from scene observations and predictions. We use a shallow Multi-Layer Perceptron (MLP) for observations and model the motion as locally linear to calculate predictions with motion equations. To further enhance the performance of the observation MLP, we introduce regularization in the canonical space to facilitate the network's ability to learn warping for different frames. Additionally, we employ an efficient tri-plane representation for encoding the canonical space, which has been experimentally demonstrated to converge quickly with high quality. This enables us to use a shallower observation MLP, consisting of just two layers in our implementation. We conduct experiments on synthetic and real data and compare with past dynamic NeRF methods. Our KFD-NeRF demonstrates similar or even superior rendering performance within comparable computational time and achieves state-of-the-art view synthesis performance with thorough training.

KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter

TL;DR

KFD-NeRF tackles dynamic view synthesis by modeling the 4D radiance field as a dynamic system and fusing predictions from a locally linear motion model with direct observations via a plug-in Kalman filter, yielding accurate framewise deformations from and . The method uses a two-branch deformation field (observer and predictor) and encodes the canonical space with an efficient tri-plane representation, augmented by regularization losses on the canonical space and a temporal-information schedule during training. Key contributions include the first integration of a neural Kalman filter into deformation-based dynamic NeRFs, a two-branch motion estimation design, and a regularized, fast-converging canonical-space embedding that enables a shallow observation MLP. Empirically, KFD-NeRF achieves state-of-the-art or competitive view synthesis on both synthetic and real dynamic scenes with comparable training efficiency, highlighting the practical value of incorporating temporal priors and Kalman-based fusion for 4D radiance fields.

Abstract

We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering. Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions. We introduce a novel plug-in Kalman filter guided deformation field that enables accurate deformation estimation from scene observations and predictions. We use a shallow Multi-Layer Perceptron (MLP) for observations and model the motion as locally linear to calculate predictions with motion equations. To further enhance the performance of the observation MLP, we introduce regularization in the canonical space to facilitate the network's ability to learn warping for different frames. Additionally, we employ an efficient tri-plane representation for encoding the canonical space, which has been experimentally demonstrated to converge quickly with high quality. This enables us to use a shallower observation MLP, consisting of just two layers in our implementation. We conduct experiments on synthetic and real data and compare with past dynamic NeRF methods. Our KFD-NeRF demonstrates similar or even superior rendering performance within comparable computational time and achieves state-of-the-art view synthesis performance with thorough training.
Paper Structure (19 sections, 16 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 16 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: In contrast to a vanilla deformation field, our plug-in Kalman filter guided deformation field consists of a prediction branch along with the direct observations from input data. From noise related terms $\varepsilon_{t_i}$ and $\varepsilon_{t_{i-1}}$ we learn Kalman gain $K_{t_i}$, weighting observations $y_{t_{i}}$ and predictions $\hat{d\bm{x}}_{t_i}^-$ for more accurate deformation estimations.
  • Figure 2: Visualization of feature planes learned by the feature interpolation method ➀ and the deformation fields method ➁. We show a point $P$ in the real world space and its corresponding four points $P_1$, $P_2$, $P_3$ and $P_4$ in the canonical space at four different timestamps. The feature plane in ➁ exhibits better smoothness compared to ➀, so we use ➁ to construct our backbone.
  • Figure 3: The overall pipeline of our KFD-NeRF. Our designed deformation field calculates final deformations based on two sources of knowledge. At the observation stage, system observations are directly output by a shallow observation MLP. At the prediction and fusion stage, we calculate predictions based on system dynamics, and further fuse observations and predictions to obtain final deformation estimations. At the spatial reconstruction stage, tri-plane encoded canonical points are concatenated with raw positions and timestamps, which are further decoded to obtain predicted colors for loss calculation. (*$f$ represents a linear layer, used to obtain the Kalman gain from noise-related terms $\varepsilon_{t_i}$ and $\varepsilon_{t_{i-1}}$.)
  • Figure 4: Qualitative Comparison of our KFD-NeRF against other dynamic NeRF methods on synthetic data. Zoom in for better details.
  • Figure 5: Qualitative Comparison of our KFD-NeRF against other dynamic NeRF methods on real data. Zoom in for better details.
  • ...and 2 more figures