Real-Time Motion Detection Using Dynamic Mode Decomposition
Marco Mignacca, Simone Brugiapaglia, Jason J. Bramburger
TL;DR
This paper introduces a real-time motion-detection method for streaming video grounded in Dynamic Mode Decomposition (DMD). By applying windowed, compressed DMD and monitoring the spectrum of continuous-time eigenvalues $\omega_m = \log(\lambda_m)/h$, the method detects motion as sharp changes in the average spectrum within successive windows, followed by background/foreground separation using DMD modes. The approach is validated on a curated security-footage-like dataset and benchmark videos, with performance quantified by ROC curves (mean AUC $\approx 0.99$) and a threshold-optimization framework incorporating a weighted error metric and a variant of $k$-fold cross-validation. The work demonstrates a simple, fast, dynamical-systems–theory–driven alternative to neural networks for motion detection, capable of isolating foreground motion in real time while highlighting the need for context-specific threshold tuning. Key limitations include potential insensitivity to very slow movement and short-lived spikes, but the method offers a transparent, implementable framework with open-source code for reproducibility.
Abstract
Dynamic Mode Decomposition (DMD) is a numerical method that seeks to fit timeseries data to a linear dynamical system. In doing so, DMD decomposes dynamic data into spatially coherent modes that evolve in time according to exponential growth/decay or with a fixed frequency of oscillation. A prolific application of DMD has been to video, where one interprets the high-dimensional pixel space evolving through time as the video plays. In this work, we propose a simple and interpretable motion detection algorithm for streaming video data rooted in DMD. Our method leverages the fact that there exists a correspondence between the evolution of important video features, such as foreground motion, and the eigenvalues of the matrix which results from applying DMD to segments of video. We apply the method to a database of test videos which emulate security footage under varying realistic conditions. Effectiveness is analyzed using receiver operating characteristic curves, while we use cross-validation to optimize the threshold parameter that identifies movement.
