Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories
Yan Zhang, Sergey Prokudin, Marko Mihajlovic, Qianli Ma, Siyu Tang
TL;DR
This work introduces DOMA, a compact implicit motion model that learns spatiotemporal affine motion fields for generic 3D scenes by encoding time as an input to a SIREN-based network. DOMA extends the prior Dynamic Point Field (DPF) from two-frame deformations to continuous multi-frame dynamics, achieving temporal smoothness through wave-equation-inspired regularization and augmenting representation power with additional output degrees of freedom in the affine part. The key theoretical insight is a Jacobian-based analysis showing how extra DOFs in the output layer increase local motion complexity without expanding hidden layers, enabling a much smaller, yet expressive model (e.g., $16d + nd^2$ parameters versus frame-wise $(6d+nd^2)(T-1)$). Empirically, DOMA variants—especially DOMA-Affinity—show superior novel point motion prediction and temporal mesh alignment on challenging datasets like DeformingThings4D and Resynth, with significantly smaller model sizes (around 200 KB vs. 8 MB) and improved temporal regularity, validating the approach and its potential for robust, priors-free motion modeling in dynamic scenes.
Abstract
Understanding the dynamics of generic 3D scenes is fundamentally challenging in computer vision, essential in enhancing applications related to scene reconstruction, motion tracking, and avatar creation. In this work, we address the task as the problem of inferring dense, long-range motion of 3D points. By observing a set of point trajectories, we aim to learn an implicit motion field parameterized by a neural network to predict the movement of novel points within the same domain, without relying on any data-driven or scene-specific priors. To achieve this, our approach builds upon the recently introduced dynamic point field model that learns smooth deformation fields between the canonical frame and individual observation frames. However, temporal consistency between consecutive frames is neglected, and the number of required parameters increases linearly with the sequence length due to per-frame modeling. To address these shortcomings, we exploit the intrinsic regularization provided by SIREN, and modify the input layer to produce a spatiotemporally smooth motion field. Additionally, we analyze the motion field Jacobian matrix, and discover that the motion degrees of freedom (DOFs) in an infinitesimal area around a point and the network hidden variables have different behaviors to affect the model's representational power. This enables us to improve the model representation capability while retaining the model compactness. Furthermore, to reduce the risk of overfitting, we introduce a regularization term based on the assumption of piece-wise motion smoothness. Our experiments assess the model's performance in predicting unseen point trajectories and its application in temporal mesh alignment with guidance. The results demonstrate its superiority and effectiveness. The code and data for the project are publicly available: \url{https://yz-cnsdqz.github.io/eigenmotion/DOMA/}
