On the Optimization of Singular Spectrum Analyses: A Pragmatic Approach
Fernando Lopes, Dominique Gibert, Vincent Courtillot, Jean-Louis Le Mouël, Jean-Baptiste Boulé
TL;DR
The paper tackles the computational bottlenecks of Singular Spectrum Analysis (SSA) when applied to long, highly sampled time series. It introduces a pragmatic-SSA pipeline that preserves the Hankel embedding and uses signal extension, randomized-SVD, energy-based thresholding, and hierarchical clustering to accelerate diagonalization and improve separability. The authors validate the approach on polar motion data and high-frequency tree biosignals, achieving results comparable to canonical SSA without downsampling and with reduced computation. This work broadens SSA applicability to large real-world datasets and offers a practical framework for robust mode separation in nonstationary signals.
Abstract
Singular Spectrum Analysis (SSA) occupies a prominent place in the real signal analysis toolkit alongside Fourier and Wavelet analysis. In addition to the two aforementioned analyses, SSA allows the separation of patterns directly from the data space into the data space, with data that need not be strictly stationary, continuous, or even normally sampled. In most cases, SSA relies on a combination of Hankel or Toeplitz matrices and Singular Value Decomposition (SVD). Like Fourier and Wavelet analysis, SSA has its limitations. The main bottleneck of the method can be summarized in three points. The first is the diagonalization of the Hankel/Toeplitz matrix, which can become a major problem from a memory and/or computational point of view if the time series to be analyzed is very long or heavily sampled. The second point concerns the size of the analysis window, typically denoted as 'L', which will affect the detection of patterns in the time series as well as the dimensions of the Hankel/Toeplitz matrix. Finally, the third point concerns pattern reconstruction: how to easily identify in the eigenvector/eigenvalue space which patterns should be grouped. We propose to address each of these issues by describing a hopefully effective approach that we have been developing for over 10 years and that has yielded good results in our research work.
