Table of Contents
Fetching ...

Vision-based Discovery of Nonlinear Dynamics for 3D Moving Target

Zitong Zhang, Yang Liu, Hao Sun

TL;DR

This work addresses learning nonlinear 3D dynamics directly from video by fusing multi-view target tracking, Rodrigues' rotation-based coordinate transformation, and a spline-enhanced library-based sparse regressor. It reconstructs 3D trajectories from a three-camera setup with calibration of only one camera and uses cubic B-splines to model the trajectory while enforcing physics constraints through a collocation-based sparse regression framework. The approach yields compact governing equations that closely match ground-truth dynamics across multiple synthetic chaotic systems and outperforms PySINDy in 3D equation discovery, even under noise and data gaps. The results demonstrate a practical pathway for vision-based discovery of dynamics with potential applications in robotics, surveillance, and scientific sensing, and point to future work on real-world videos and multi-target dynamics.

Abstract

Data-driven discovery of governing equations has kindled significant interests in many science and engineering areas. Existing studies primarily focus on uncovering equations that govern nonlinear dynamics based on direct measurement of the system states (e.g., trajectories). Limited efforts have been placed on distilling governing laws of dynamics directly from videos for moving targets in a 3D space. To this end, we propose a vision-based approach to automatically uncover governing equations of nonlinear dynamics for 3D moving targets via raw videos recorded by a set of cameras. The approach is composed of three key blocks: (1) a target tracking module that extracts plane pixel motions of the moving target in each video, (2) a Rodrigues' rotation formula-based coordinate transformation learning module that reconstructs the 3D coordinates with respect to a predefined reference point, and (3) a spline-enhanced library-based sparse regressor that uncovers the underlying governing law of dynamics. This framework is capable of effectively handling the challenges associated with measurement data, e.g., noise in the video, imprecise tracking of the target that causes data missing, etc. The efficacy of our method has been demonstrated through multiple sets of synthetic videos considering different nonlinear dynamics.

Vision-based Discovery of Nonlinear Dynamics for 3D Moving Target

TL;DR

This work addresses learning nonlinear 3D dynamics directly from video by fusing multi-view target tracking, Rodrigues' rotation-based coordinate transformation, and a spline-enhanced library-based sparse regressor. It reconstructs 3D trajectories from a three-camera setup with calibration of only one camera and uses cubic B-splines to model the trajectory while enforcing physics constraints through a collocation-based sparse regression framework. The approach yields compact governing equations that closely match ground-truth dynamics across multiple synthetic chaotic systems and outperforms PySINDy in 3D equation discovery, even under noise and data gaps. The results demonstrate a practical pathway for vision-based discovery of dynamics with potential applications in robotics, surveillance, and scientific sensing, and point to future work on real-world videos and multi-target dynamics.

Abstract

Data-driven discovery of governing equations has kindled significant interests in many science and engineering areas. Existing studies primarily focus on uncovering equations that govern nonlinear dynamics based on direct measurement of the system states (e.g., trajectories). Limited efforts have been placed on distilling governing laws of dynamics directly from videos for moving targets in a 3D space. To this end, we propose a vision-based approach to automatically uncover governing equations of nonlinear dynamics for 3D moving targets via raw videos recorded by a set of cameras. The approach is composed of three key blocks: (1) a target tracking module that extracts plane pixel motions of the moving target in each video, (2) a Rodrigues' rotation formula-based coordinate transformation learning module that reconstructs the 3D coordinates with respect to a predefined reference point, and (3) a spline-enhanced library-based sparse regressor that uncovers the underlying governing law of dynamics. This framework is capable of effectively handling the challenges associated with measurement data, e.g., noise in the video, imprecise tracking of the target that causes data missing, etc. The efficacy of our method has been demonstrated through multiple sets of synthetic videos considering different nonlinear dynamics.
Paper Structure (23 sections, 19 equations, 15 figures, 5 tables)

This paper contains 23 sections, 19 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Schematic of vision-based discovery of nonlinear dynamics for 3D moving target. Firstly, we record the motion trajectory of the object in a 3D space using multiple cameras in a predefined reference coordinate system (see a). Pixel trajectory coordinates are obtained through target identification and tracking. Note that camera parameters include the camera's position, the normal vector of the camera's view plane, and the calibrated camera parameters, which comprise the scaling factor and tilt angle. In particular, we use coordinate learning and transformation to obtain the spatial motion trajectory in the reference coordinate system. Secondly, for each dimension of the trajectory, we introduce a spline-enhanced library-based sparse regressor to uncover the underlying governing law of dynamics. The differentiation for the trajectory and spline curve with respect to time are respectively given by $\dot{\mathbf{x}} =d \mathbf{x} / d t$, $\dot{\mathbf{G}} = d \mathbf{G} / dt$ (see b).
  • Figure 2: Discovered 3D trajectories vs. the ground truth.
  • Figure 3: The influence of noisy and missing data (e.g., random block and fiber missing) on the experimental results, using the sprootF video data as an example (other systems can be found in Appendix \ref{['appendix: Test disturbing effect for other systems']}). The evaluation metrics include the $\ell_2$ relative error and the number of incorrectly identified equation coefficients. We analyzed the effect of (a) noise levels, (b) random block missing rates, and (c) fiber missing rates, respectively, to test the model's robustness.
  • Figure 4: Example of a synthetic dataset simulating real-world scenarios. a. An example of the generated video for an object with an irregular shape undergoing random self-rotational motion and size variations. The video frames were perturbed with a zero mean Gaussian noise (variance = 0.01), and a tree-like obstruction was introduced to further simulate real-world complexity. b. We reconstructed the 3D trajectory of the observed target under conditions of occlusion-induced data missing. The shading areas indicate the regions impacted by the obstruction. Our approach can reconstruct the 3D point trajectories from sparse observation points, revealing accurate discovery of the underlying governing equations. Note that the video file can be found in the supplementary material.
  • Figure S1: Schematic of trajectory projection from the 3D space to a 2D plane. The blue trajectory represents the 3D motion trajectory, while the red trajectory represents its projection on the 2D camera plane. Here, $C1$ denotes the position of the camera. The normal vector of the camera plane $X^{'}OY^{'}$ is denoted as $Z^{'}$.
  • ...and 10 more figures