Table of Contents
Fetching ...

Symbolic Higher-Order Analysis of Multivariate Time Series

Andrea Civilini, Fabrizio de Vico Fallani, Vito Latora

TL;DR

A method that detects dependencies of any order in multivariate time series data is introduced, which first transforms a multivariate time series into a symbolic sequence, and then extracts statistically significant strings of symbols through a Bayesian approach.

Abstract

Identifying patterns of relations among the units of a complex system from measurements of their activities in time is a fundamental problem with many practical applications. Here, we introduce a method that detects dependencies of any order in multivariate time series data. The method first transforms a multivariate time series into a symbolic sequence, and then extract statistically significant strings of symbols through a Bayesian approach. Such motifs are finally modelled as the hyperedges of a hypergraph, allowing us to use network theory to study higher-order interactions in the original data. When applied to neural and social systems, our method reveals meaningful higher-order dependencies, highlighting their importance in both brain function and social behaviour.

Symbolic Higher-Order Analysis of Multivariate Time Series

TL;DR

A method that detects dependencies of any order in multivariate time series data is introduced, which first transforms a multivariate time series into a symbolic sequence, and then extracts statistically significant strings of symbols through a Bayesian approach.

Abstract

Identifying patterns of relations among the units of a complex system from measurements of their activities in time is a fundamental problem with many practical applications. Here, we introduce a method that detects dependencies of any order in multivariate time series data. The method first transforms a multivariate time series into a symbolic sequence, and then extract statistically significant strings of symbols through a Bayesian approach. Such motifs are finally modelled as the hyperedges of a hypergraph, allowing us to use network theory to study higher-order interactions in the original data. When applied to neural and social systems, our method reveals meaningful higher-order dependencies, highlighting their importance in both brain function and social behaviour.

Paper Structure

This paper contains 16 sections, 16 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: (a) Each time series $x_i(t)$, with $i=1,\ldots N$, is associated to one of $N$ different symbols/colours. (b) Then the set of time series is converted into a symbolic sequence: if an event $x_i(t)=1$ is followed by $x_j(t) = 1$ within a time interval $\Delta t$ (grey bar in (a)), their corresponding symbols are placed adjacently in the symbolic sequence. Otherwise (pink cross) a "space" symbol is inserted in the sequence. (c-d) Lists of ordered and unordered $2$-tuples and $3$-tuples found in the sequence. (e) Statistically significant tuples (motifs) are finally the hyperedges of an hypergraph.
  • Figure 2: ROC and Precision-Recall curves for (a,b) the BJS score and (c,d) the $z$-score. Here, we consider 2- and 3-motifs together. The distribution of symbols in the artificial sequence follows Zipf's law ($\gamma = 1$). The alphabet contains 100 symbols (plus a special empty-space character), and we generated 50 2-motifs and 50 3-motifs, repeated 10 and 5 times, respectively. Different curves correspond to different noise levels $r_{ns}$.
  • Figure S1: Distributions of the observed occurrences of unique tuples and motifs (i.e., significant tuples) in an artificial symbolic sequence created with our algorithm, for $2$-tuples (left) and $3$-tuples (right). These distributions refers to an artificial symbolic series created using an alphabet of $100$ symbols (plus the empty-space) and the following parameters: $n_2 = 478$, $n_3 = 499$ (starting from 500 randomly created 2-motifs and 500 3-motifs, with duplicates subsequently removed), $r_2 = 175$, $r_3 = 25$, $r_{ns} = 10$, and $\gamma = -1$.
  • Figure S2: Testing the method on an artificial sequence with noise-to-signal ratio $r_{ns} = 10$ and $\gamma = -1$ (linear rank-frequency distribution). (a–b) Comparison between prior $\Pi(p)$ and posterior $P(p \mid \text{Data})$ distributions for a non-significant tuple and a significant one (i.e., a motif), respectively. (c–d) Histogram of the number of $2$- and $3$-tuples as a function of the Jensen-Shannon distance $d_{\text{JS}}$. A logarithmic scale is used for the $y$-axis in panel (d) to improve readability, given the larger number of possible $3$-tuples compared to motifs.
  • Figure S3: Confusion matrices for the BJS-score in the classification of $2$-tuples (a) and $3$-tuples (b) in artificial data, using a significance threshold of $\text{BJS}^{\text{thr}} = 0.6$. (c,d) Bar plots showing the corresponding performance metrics of the BJS-score compared to those of the z-score, computed on the same data with a threshold of $z^{\text{thr}} = 3$. Our method, based on the BJS-score, consistently outperforms the z-score, particularly in the detection of higher-order motifs.
  • ...and 4 more figures