Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia, Jan van Gemert, Daan Brinks, Nergis Tömen
TL;DR
This work tackles unsupervised physical parameter estimation from videos governed by known continuous equations, addressing the limitations of frame-reconstruction approaches that are typically restricted to motion and require labels. It proposes a decoder-free architecture that learns latent dynamics via an encoder and a differentiable physics block, optimized with a two-term loss that includes a KL-divergence regularizer to prevent collapse. The authors validate their method on synthetic datasets with damped second-order dynamics and demonstrate robustness to initialization, outperforming baselines that rely on frame prediction or masks. They further introduce Delfys75, a real-world dataset with ground-truth parameters across five dynamical systems, and show competitive parameter recovery without object masks, highlighting practical applicability to real-world video physics. Overall, the method advances unsupervised, decoder-free estimation of physical parameters from diverse video-based dynamical systems and provides a new benchmark for evaluation in Delfys75.
Abstract
Extracting physical dynamical system parameters from recorded observations is key in natural science. Current methods for automatic parameter estimation from video train supervised deep networks on large datasets. Such datasets require labels, which are difficult to acquire. While some unsupervised techniques--which depend on frame prediction--exist, they suffer from long training times, initialization instabilities, only consider motion-based dynamical systems, and are evaluated mainly on synthetic data. In this work, we propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos suitable for different dynamical systems beyond motion and robust to initialization. Moreover, we remove the need for frame prediction by implementing a KL-divergence-based loss function in the latent space, which avoids convergence to trivial solutions and reduces model size and compute. We first evaluate our model on synthetic data, as commonly done. After which, we take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems to evaluate our method and others. Our method compares favorably to others. %, yet, and real-world video datasets and demonstrate improved parameter estimation accuracy compared to existing methods. Code and data are available online:https://github.com/Alejandro-neuro/Learning_physics_from_video.
