Classifying Complex Dynamical and Stochastic Systems via Physics-Based Recurrence Features
J. V. M. Silveira, H. C. Costa, G. S. Spezzatto, T. L. Prado, S. R. Lopes
TL;DR
The paper tackles the challenge of classifying parameters in chaotic, continuous, and stochastic systems from time series. It introduces recurrence microstate probabilities and entropy-based thresholding to create a physics-informed, compact feature space, then evaluates a suite of ML classifiers on this representation. The key finding is that recurrence microstate features dramatically improve classification accuracy and reduce computation compared with raw data, with Random Forest and MLP frequently excelling (e.g., Lorenz with $N=4$ achieves 100% accuracy). This approach offers a practical, scalable means to extract dynamical signatures from time series and is applicable to real-world data domains like neuroscience and climatology. ${S(\varepsilon) = -\sum_{i=1}^{2^{N^{2}}} P_i(\varepsilon)\ln P_i(\varepsilon)}$ and similar recurrence-analytic constructs underpin the method's effectiveness, enabling robust discrimination of dynamical regimes in a compact feature space.
Abstract
In this study, we employ the recently developed recurrence microstate probabilities as features to improve accuracy of several well-established machine learning (ML) algorithms. These algorithms are applied to classify discrete and continuous dynamical systems, as well as colored noise. We demonstrate that the dynamical characteristics quantified by this method are effectively captured in the recurrence microstate space, a space defined solely by the recurrence properties of the signal. This space change reduces dimensions, which also reduces the necessary time to perform calculations and obtain relevant information about the underlying system. Here, we also demonstrate that a few optimal machine learning (ML) algorithms are particularly effective for classification when combined with recurrence microstates. Furthermore, these new machine learning vectors significantly reduce memory usage and computational complexity, outperforming the direct analysis of raw data.
