Table of Contents
Fetching ...

On the Optimization of Singular Spectrum Analyses: A Pragmatic Approach

Fernando Lopes, Dominique Gibert, Vincent Courtillot, Jean-Louis Le Mouël, Jean-Baptiste Boulé

TL;DR

The paper tackles the computational bottlenecks of Singular Spectrum Analysis (SSA) when applied to long, highly sampled time series. It introduces a pragmatic-SSA pipeline that preserves the Hankel embedding and uses signal extension, randomized-SVD, energy-based thresholding, and hierarchical clustering to accelerate diagonalization and improve separability. The authors validate the approach on polar motion data and high-frequency tree biosignals, achieving results comparable to canonical SSA without downsampling and with reduced computation. This work broadens SSA applicability to large real-world datasets and offers a practical framework for robust mode separation in nonstationary signals.

Abstract

Singular Spectrum Analysis (SSA) occupies a prominent place in the real signal analysis toolkit alongside Fourier and Wavelet analysis. In addition to the two aforementioned analyses, SSA allows the separation of patterns directly from the data space into the data space, with data that need not be strictly stationary, continuous, or even normally sampled. In most cases, SSA relies on a combination of Hankel or Toeplitz matrices and Singular Value Decomposition (SVD). Like Fourier and Wavelet analysis, SSA has its limitations. The main bottleneck of the method can be summarized in three points. The first is the diagonalization of the Hankel/Toeplitz matrix, which can become a major problem from a memory and/or computational point of view if the time series to be analyzed is very long or heavily sampled. The second point concerns the size of the analysis window, typically denoted as 'L', which will affect the detection of patterns in the time series as well as the dimensions of the Hankel/Toeplitz matrix. Finally, the third point concerns pattern reconstruction: how to easily identify in the eigenvector/eigenvalue space which patterns should be grouped. We propose to address each of these issues by describing a hopefully effective approach that we have been developing for over 10 years and that has yielded good results in our research work.

On the Optimization of Singular Spectrum Analyses: A Pragmatic Approach

TL;DR

The paper tackles the computational bottlenecks of Singular Spectrum Analysis (SSA) when applied to long, highly sampled time series. It introduces a pragmatic-SSA pipeline that preserves the Hankel embedding and uses signal extension, randomized-SVD, energy-based thresholding, and hierarchical clustering to accelerate diagonalization and improve separability. The authors validate the approach on polar motion data and high-frequency tree biosignals, achieving results comparable to canonical SSA without downsampling and with reduced computation. This work broadens SSA applicability to large real-world datasets and offers a practical framework for robust mode separation in nonstationary signals.

Abstract

Singular Spectrum Analysis (SSA) occupies a prominent place in the real signal analysis toolkit alongside Fourier and Wavelet analysis. In addition to the two aforementioned analyses, SSA allows the separation of patterns directly from the data space into the data space, with data that need not be strictly stationary, continuous, or even normally sampled. In most cases, SSA relies on a combination of Hankel or Toeplitz matrices and Singular Value Decomposition (SVD). Like Fourier and Wavelet analysis, SSA has its limitations. The main bottleneck of the method can be summarized in three points. The first is the diagonalization of the Hankel/Toeplitz matrix, which can become a major problem from a memory and/or computational point of view if the time series to be analyzed is very long or heavily sampled. The second point concerns the size of the analysis window, typically denoted as 'L', which will affect the detection of patterns in the time series as well as the dimensions of the Hankel/Toeplitz matrix. Finally, the third point concerns pattern reconstruction: how to easily identify in the eigenvector/eigenvalue space which patterns should be grouped. We propose to address each of these issues by describing a hopefully effective approach that we have been developing for over 10 years and that has yielded good results in our research work.

Paper Structure

This paper contains 15 sections, 6 equations, 11 figures.

Figures (11)

  • Figure 1: On the right is the frequency at which a colored segment of a time signal (right column) appears in the Hankel matrix. This is shown at the top for the original signal and at the bottom after the signal has been extended by duplicating its boundary segments.
  • Figure 2: Evolution of computation time (y-axis) as a function of the increasing complexity of the Hankel matrix (x-axis). The chosen rank ($q$) is arbitrarily set to $L=300$ points. In red, the computation time for the canonical SVD, and in blue, the computation time for the Randomized-SVD with two iterations of the power loop.
  • Figure 3: Singular values obtained after the SVD of the Hankel matrix of the polar motion time series (in black). In red, the singular values whose the cumulative sum exceeds the threshold, set to 90%.
  • Figure 4: The $m_1$ component of the polar motion (gray curve). At the top, overlaid in blue, is the component reconstructed using all the eigentriplets obtained after a canonical SSA. At the bottom, overlaid in red, is the component reconstructed using all the eigentriplets obtained after a pragmatic SSA. The gray shaded region from 1846 to 1860 represents a recently added segment to the $m_1$ time series. As can be seen, the reconstruction in this region is less accurate for both approaches.
  • Figure 5: The Chandler wobble extracted from the data shown in Figure \ref{['fig:04a']}; in blue is the canonical-SSA, in red is our pragmatic-SSA.
  • ...and 6 more figures