Table of Contents
Fetching ...

Modeling cyclostationarity in time series using ASCA

Daniel Vallejo-España, Jesús García Sánchez, Manuel Villar-Argaiz, Concepción De Linares, José Camacho

TL;DR

This work proposes a unified pipeline for the exploratory analysis of cyclostationary times series using ANOVA Simultaneous Component Analysis (ASCA), an extension of ANOVA that is able to work in both univariate and multivariate cases and observes that ASCA provides a better separation of variability across factors than ANOVA in unbalanced designs due to its multivariate nature.

Abstract

Modern data analysis across diverse disciplines increasingly relies on time series. Many of these datasets exhibit cyclostationarity, where patterns approximately repeat in a regular manner, often across multiple time scales, such as daily, weekly or yearly cycles. In this context, statistical inference is essential to distinguish genuine underlying effects from random variability. While tools like Analysis of Variance (ANOVA) provide such inference, they often lack interpretability and struggle with the complexities of multivariate data. To address these limitations, we propose a unified pipeline for the exploratory analysis of cyclostationary times series using ANOVA Simultaneous Component Analysis (ASCA). ASCA is an extension of ANOVA that is able to work in both univariate and multivariate cases. Combining inference with the visualization capabilities of Principal Component Analysis (PCA), ASCA provides powerful options for interpretability. ASCA's capabilities have been well-established in the analysis of experimental data, but they remain largely unexplored for observational data like time series. Our workflow introduces an algorithmic approach to modeling time-dependent data using ASCA, enabling control over multiple cyclostationary time scales while also accounting for the specific challenges of this type of data, such as autocorrelation. Furthermore, we observed that ASCA provides a better separation of variability across factors than ANOVA in unbalanced designs due to its multivariate nature. We demonstrate the efficacy of this methodology through two real-world case studies: water temperature trends in mountain lakes in Sierra Nevada, Spain, and airborne pollen trends over 30 years recorded in the city of Granada, Spain.

Modeling cyclostationarity in time series using ASCA

TL;DR

This work proposes a unified pipeline for the exploratory analysis of cyclostationary times series using ANOVA Simultaneous Component Analysis (ASCA), an extension of ANOVA that is able to work in both univariate and multivariate cases and observes that ASCA provides a better separation of variability across factors than ANOVA in unbalanced designs due to its multivariate nature.

Abstract

Modern data analysis across diverse disciplines increasingly relies on time series. Many of these datasets exhibit cyclostationarity, where patterns approximately repeat in a regular manner, often across multiple time scales, such as daily, weekly or yearly cycles. In this context, statistical inference is essential to distinguish genuine underlying effects from random variability. While tools like Analysis of Variance (ANOVA) provide such inference, they often lack interpretability and struggle with the complexities of multivariate data. To address these limitations, we propose a unified pipeline for the exploratory analysis of cyclostationary times series using ANOVA Simultaneous Component Analysis (ASCA). ASCA is an extension of ANOVA that is able to work in both univariate and multivariate cases. Combining inference with the visualization capabilities of Principal Component Analysis (PCA), ASCA provides powerful options for interpretability. ASCA's capabilities have been well-established in the analysis of experimental data, but they remain largely unexplored for observational data like time series. Our workflow introduces an algorithmic approach to modeling time-dependent data using ASCA, enabling control over multiple cyclostationary time scales while also accounting for the specific challenges of this type of data, such as autocorrelation. Furthermore, we observed that ASCA provides a better separation of variability across factors than ANOVA in unbalanced designs due to its multivariate nature. We demonstrate the efficacy of this methodology through two real-world case studies: water temperature trends in mountain lakes in Sierra Nevada, Spain, and airborne pollen trends over 30 years recorded in the city of Granada, Spain.
Paper Structure (14 sections, 7 equations, 4 figures, 2 tables)

This paper contains 14 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Pipeline for modeling cyclostationarity with ASCA.
  • Figure 2: Energy consumption dataset representation.A Tensor representation where the four modes represent city, hour of the day, day of the week, and week of the year. B Matrix unfolding with modes 'city' and 'week of the year' as rows, and 'hour of the day' and 'day of the week' as columns. C Matrix unfolding with modes 'city', 'week of the year' and 'day of the week' as rows, and 'hour of the day' as columns.
  • Figure 3: ASCA model of the Sierra Nevada lakes dataset.A scores for factor 'year'. B loadings for factor 'year', colored by season. C scores for PCs 1 and 2 of factor 'sensor', colored by lake. D loadings for PCs 1 and 2 of factor 'sensor', colored by season. E number of missing values per sensor and year, expressed in months (a month is equivalent to 240 missing values). The gray band indicates the threshold above which rows were removed. F percentage of sum of squares of the ASCA factorization by variable (column of the data matrix), in blue. The mean of this sum-of-squares vector is represented in green. The total percentage of explained variance for ASCA and for the equivalent permutation-based ANOVA model are shown in black and red, respectively.
  • Figure 4: ASCA model of the pollen dataset. Main factors. A scores of factor year colored by year. We can see a heavy increase in pollen concentrations from 2018 to 2022 compared to previous years. B loadings of factor year colored to highlight the pollen types with the largest growth and decay over the years. C biplot of factor fortnight with scores colored according to fortnight. The cyclic annual behavior of seasons is highlighted by the scores, while the loadings indicate which pollen types show the highest particle counts at each time of the year. D diagram of the yearly clock-like pattern present in the biplot of factor fortnight. E scores of the first component of the interaction between year and fortnight. The scores have been sorted according to season and year. We can see that all seasons remain constant over the years, with the exception of spring. Spring displays a growth in pollen concentrations over the years. F scores of the first component of the interaction between year and fortnight, colored to showcase the pollen types with the largest and smallest increase during spring over the years. We can see that Quercus and Plantago pollen types have experienced the largest spring growth over the years, while Artemisia and Fraxinus pollen types show the smallest spring increase over the years. G normalized yearly counts of Quercus, Plantago and Artemisia pollen types across the 30 years. Both Quercus and Plantago display spring growth in the latter years, while Artemisia pollen displays a decrease in counts in seasons other than spring. The pollen types shown in the figures are named as acronyms: Arte, Artemisia; Casu, Casuarina; Frax, Fraxinus; Alnu, Alnus; Ulmu, Ulmus; Cupr, Cupressaceae; Popu, Populus; Acer, Acer; Plat, Platanus; Sali, Salix; Mora, Moraceae; Urti, Urticaceae; Pinu, Pinus; Bras, Brassicaceae; Indet, Indeterminate; Quer, Quercus; Plan, Plantago; Rume, Rumex; Poac, Poaceae; Cast, Castanea; Olea, Olea; Comp, Compositae; Apia, Apiaceae; Chen, Amaranthaceae.