Table of Contents
Fetching ...

Regularizing Dynamic Radiance Fields with Kinematic Fields

Woobin Im, Geonho Cha, Sebin Lee, Jumin Lee, Juhyeong Seon, Dongyoon Wee, Sung-Eui Yoon

TL;DR

The paper tackles dynamic scene reconstruction from monocular video by introducing a kinematic field that outputs velocity $\mathbf{v}$, acceleration $\mathbf{a}$, and jerk $\mathbf{j}$ and learns it jointly with dynamic and static radiance fields $\mathcal{F}_{\text{DY}}$ and $\mathcal{F}_{\text{ST}}$. It couples these fields through photometric losses and physics-informed regularizers, including transport and rigidity terms, to enforce physically plausible motion and trajectories. Empirical results on the NDVS dataset demonstrate improved rendering quality and motion consistency, with faster training times compared to prior methods like NSFF, DynamicNeRF, and HyperNeRF, while also providing rich kinematic estimates. This physics-grounded framework advances monocular dynamic 3D reconstruction, enabling more accurate 4D representations and motion-aware scene understanding for real-world applications.

Abstract

This paper presents a novel approach for reconstructing dynamic radiance fields from monocular videos. We integrate kinematics with dynamic radiance fields, bridging the gap between the sparse nature of monocular videos and the real-world physics. Our method introduces the kinematic field, capturing motion through kinematic quantities: velocity, acceleration, and jerk. The kinematic field is jointly learned with the dynamic radiance field by minimizing the photometric loss without motion ground truth. We further augment our method with physics-driven regularizers grounded in kinematics. We propose physics-driven regularizers that ensure the physical validity of predicted kinematic quantities, including advective acceleration and jerk. Additionally, we control the motion trajectory based on rigidity equations formed with the predicted kinematic quantities. In experiments, our method outperforms the state-of-the-arts by capturing physical motion patterns within challenging real-world monocular videos.

Regularizing Dynamic Radiance Fields with Kinematic Fields

TL;DR

The paper tackles dynamic scene reconstruction from monocular video by introducing a kinematic field that outputs velocity , acceleration , and jerk and learns it jointly with dynamic and static radiance fields and . It couples these fields through photometric losses and physics-informed regularizers, including transport and rigidity terms, to enforce physically plausible motion and trajectories. Empirical results on the NDVS dataset demonstrate improved rendering quality and motion consistency, with faster training times compared to prior methods like NSFF, DynamicNeRF, and HyperNeRF, while also providing rich kinematic estimates. This physics-grounded framework advances monocular dynamic 3D reconstruction, enabling more accurate 4D representations and motion-aware scene understanding for real-world applications.

Abstract

This paper presents a novel approach for reconstructing dynamic radiance fields from monocular videos. We integrate kinematics with dynamic radiance fields, bridging the gap between the sparse nature of monocular videos and the real-world physics. Our method introduces the kinematic field, capturing motion through kinematic quantities: velocity, acceleration, and jerk. The kinematic field is jointly learned with the dynamic radiance field by minimizing the photometric loss without motion ground truth. We further augment our method with physics-driven regularizers grounded in kinematics. We propose physics-driven regularizers that ensure the physical validity of predicted kinematic quantities, including advective acceleration and jerk. Additionally, we control the motion trajectory based on rigidity equations formed with the predicted kinematic quantities. In experiments, our method outperforms the state-of-the-arts by capturing physical motion patterns within challenging real-world monocular videos.
Paper Structure (32 sections, 26 equations, 14 figures, 8 tables)

This paper contains 32 sections, 26 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Radiance and kinematic fields. This figure summarizes the three fields our method utilizes. The static and dynamic radiance fields are used for rendering, and the kinematic field is used in the training phase, regularizing the dynamic radiance field.
  • Figure 2: Visualization of each predicted component. The top row presents synthesized views at different times. In (a) and (b), the dynamic and static components of the scene at time $t_2$ are depicted separately. (b) In the case of an inobservable static area in the entire sequence (e.g., the space behind the jumping people), radiance might not be correct. The displacement field (c) can be computed by Taylor approximation (Eq. \ref{['eq:taylor']}) with motion fields (d-f): velocity, acceleration, and jerk. Each field is visualized through reprojecting each field to the camera view. The standard HSV visualization baker2011database is used to colorize arrows.
  • Figure 3: Effect of kinematic regularization. We visualize the rendered RGB and motion of each field. Without kinematic regularization, motion fields tend to show granular patterns. Our kinematic fields not only make the field smoother but also satisfy the kinematic property, i.e., $\mathbf{a} = \partial \mathbf{v}/\partial t + \mathbf{v}\cdot\nabla \mathbf{v}$. We abbreviate the advective equation to 'Adv.' in the figure.
  • Figure 4: Visualization of density variation by a spatial coordinate $x$. The plot displays the gradient $-\partial \mathcal{L}_\text{T}/\partial v_{\sigma_d}$ with arrows. With the velocity field directed to the right (i.e., $\mathbf{v}_x = 0.3$), we can compute the gradient of the transport regularization $\mathcal{L}_\text{T}$ w.r.t. the rate of density change $v_{\sigma_d} = \partial \sigma_d / \partial t$. Minimizing $\mathcal{L}_\text{T}$ allows us to render the density at $t+\Delta t$ aligned with the flow field.
  • Figure 5: Photometric consistency loss. During optimization, we deform each ray from a reference time and pose $(t, P_t)$ to different timestamps $\gamma$ and $i$. Here, we sample $i$ from the neighboring frame times, and we consequently sample an intermediate timestamp $\gamma\sim \mathcal{U}(t, i)$. Given each timestamp, we can deform a ray using Eq. \ref{['eq:taylor']}. The photometric loss $\mathcal{L}_\text{photo}$ is computed based on the color consistency between the deformed ray and the original color, enhancing temporal consistency.
  • ...and 9 more figures