Table of Contents
Fetching ...

A feature-based information-theoretic approach for detecting interpretable, long-timescale pairwise interactions from time series

Aria Nguyen, Oscar McMullin, Joseph T. Lizier, Ben D. Fulcher

TL;DR

The paper presents a feature-based information-theoretic framework for detecting long-timescale, feature-mediated interactions between time series. By transforming source-window segments into concise statistics $Z_t=f( extbf{X}_t^{(l)})$ and measuring $I(Z_t;Y_{t+1})$, it reduces high-dimensional density estimation challenges and yields interpretable insights via candidate time-series features (e.g., catch22). Across simulations with stationary noise and non-stationary dynamics (AR(3) and bimodal spiking), the method $MI_F$ consistently outperforms traditional signal-space MI $MI_s$ in detecting dependencies, especially under short data, high noise, and long interaction timescales, and it can also suggest the underlying timescale of interaction. The approach offers a flexible, domain-adaptable tool for analyzing complex systems, with implications for fields from neuroscience to finance, where interactions may be mediated by statistical properties rather than raw signals.

Abstract

Quantifying relationships between components of a complex system is critical to understanding the rich network of interactions that characterize the behavior of the system. Traditional methods for detecting pairwise dependence of time series, such as Pearson correlation, Granger causality, and mutual information, are computed directly in the space of measured time-series values. But for systems in which interactions are mediated by statistical properties of the time series (`time-series features') over longer timescales, this approach can fail to capture the underlying dependence from limited and noisy time-series data, and can be challenging to interpret. Addressing these issues, here we introduce an information-theoretic method for detecting dependence between time series mediated by time-series features that provides interpretable insights into the nature of the interactions. Our method extracts a candidate set of time-series features from sliding windows of the source time series and assesses their role in mediating a relationship to values of the target process. Across simulations of three different generative processes, we demonstrate that our feature-based approach can outperform a traditional inference approach based on raw time-series values, especially in challenging scenarios characterized by short time-series lengths, high noise levels, and long interaction timescales. Our work introduces a new tool for inferring and interpreting feature-mediated interactions from time-series data, contributing to the broader landscape of quantitative analysis in complex systems research, with potential applications in various domains including but not limited to neuroscience, finance, climate science, and engineering.

A feature-based information-theoretic approach for detecting interpretable, long-timescale pairwise interactions from time series

TL;DR

The paper presents a feature-based information-theoretic framework for detecting long-timescale, feature-mediated interactions between time series. By transforming source-window segments into concise statistics and measuring , it reduces high-dimensional density estimation challenges and yields interpretable insights via candidate time-series features (e.g., catch22). Across simulations with stationary noise and non-stationary dynamics (AR(3) and bimodal spiking), the method consistently outperforms traditional signal-space MI in detecting dependencies, especially under short data, high noise, and long interaction timescales, and it can also suggest the underlying timescale of interaction. The approach offers a flexible, domain-adaptable tool for analyzing complex systems, with implications for fields from neuroscience to finance, where interactions may be mediated by statistical properties rather than raw signals.

Abstract

Quantifying relationships between components of a complex system is critical to understanding the rich network of interactions that characterize the behavior of the system. Traditional methods for detecting pairwise dependence of time series, such as Pearson correlation, Granger causality, and mutual information, are computed directly in the space of measured time-series values. But for systems in which interactions are mediated by statistical properties of the time series (`time-series features') over longer timescales, this approach can fail to capture the underlying dependence from limited and noisy time-series data, and can be challenging to interpret. Addressing these issues, here we introduce an information-theoretic method for detecting dependence between time series mediated by time-series features that provides interpretable insights into the nature of the interactions. Our method extracts a candidate set of time-series features from sliding windows of the source time series and assesses their role in mediating a relationship to values of the target process. Across simulations of three different generative processes, we demonstrate that our feature-based approach can outperform a traditional inference approach based on raw time-series values, especially in challenging scenarios characterized by short time-series lengths, high noise levels, and long interaction timescales. Our work introduces a new tool for inferring and interpreting feature-mediated interactions from time-series data, contributing to the broader landscape of quantitative analysis in complex systems research, with potential applications in various domains including but not limited to neuroscience, finance, climate science, and engineering.
Paper Structure (15 sections, 16 equations, 7 figures, 1 table)

This paper contains 15 sections, 16 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: We introduce a feature-based formulation of Mutual Information, denoted $\bold{MI_f}$, for detecting pairwise interactions between time series. This figure illustrates our method for detecting cases in which a target process, $Y$, is influenced by a statistical property of a recent time window of a source process, $X$. We contrast it to conventional MI, estimated directly from the signal-space of the variables, which we denote as $\mathrm{MI}_s$. (a)$\mathrm{MI}_s$ is computed based on the observed time-series values of process $X$ and $Y$ [see Eq. \ref{['eqn:MI_s']}]. (b)$\mathrm{MI}_f$ iterates through time-series segments of length $l$ of process $X$ and reduces each window to a single real-valued summary statistic $z_t$ [see \ref{['eqn:feature']}]. MI is then computed between feature variable $Z_t$ and the target variable $Y_{t+1}$ [cf. Eq. \ref{['eqn:MI_f']}]. We can iterate over a set of candidate features for mapping $\mathbf{x}_t^{(l)}$ to $z_t$, allowing us to detect for interaction between $X$ and $Y$ when it is mediated by a candidate time-series property.
  • Figure 2: We simulated pairs of dynamical processes with feature-mediated interactions to evaluate the performance of pairwise dependence methods.(a) We simulated a source and target time series, $\mathbf{x}$ and $\mathbf{y}$ respectively, with a ground-truth feature-mediated dependency. Next we computed a feature time series, $\mathbf{\tilde{z}}$, by extracting features across a set of sliding windows of $\mathbf{x}$ (window size $\tilde{l}$) (cf. \ref{['eqn:driving_feature']}) (note our convention here to use the tilde symbol for ground-truth: $\mathbf{\tilde{z}}$ is the ground-truth feature that drives the interaction between $X$ and $Y$, and $\tilde{l}$ is the ground-truth timescale of interaction). Then the target time series $\mathbf{y}$ was generated using a linear function of this feature after standardizing it (cf. \ref{['eqn:driving_target']}). (b) For a given timescale $l$ (from the set of timescales of interest), we computed feature-based MI, $\mathrm{MI}_f$, using a set of 24 capturing time-series features $F$, computed over sliding windows size $l$ that yield a feature time-series $\mathbf{z}$ for each feature in the capturing feature set (cf. Eq. \ref{['eqn:MI_f']}). For comparison, we computed $\mathrm{MI}_s$ between $\mathbf{x}_t^{(l)}$ and $y_{t+1}$ (cf. \ref{['eqn:MI_s']}). (c) We estimated the $p$-values of $\mathrm{MI}_{(f_{i})}$ for $f_i \in F$ and $\mathrm{MI}_s$ using permutation testing. Each simulation was run 50 times to obtain the MI values capture rates for $\mathrm{MI}_{(f_{i})}$ and $\mathrm{MI}_s$ as the percentage of time where the $p$-values obtained by each method are less than their respective $p$-value thresholds (see Sec. \ref{['subsec:simulation_studies']}). We then have the capture rate for each of 24 capturing features, as well as the capture rate of $\mathrm{MI}_F$ as a whole -- in a given simulation run, if any of the capturing feature detects the dependence then it is considered captured by $\mathrm{MI}_F$.
  • Figure 3: We studied three generative source processes, with target processes linearly coupled to local dynamical properties of each source process. For each process, here we show example time-series realizations. The plots represent time series of length 1000 for each process. (a)--(c) A random noise process, \ref{['eqn:random_noise']}. (a) A time-series realization of the the process. (b) An example feature time series, for the feature labeled trev, $\mathbf{\tilde{z}_{\texttt{trev}}}$ (defined in Eq. \ref{['eqn:trev']}), which computes the average across the time series of the cube of successive time-series differences, computed over sliding windows of duration $\tilde{l} = 10$ of $\mathbf{x}$ (cf. Eq. \ref{['eqn:driving_feature']}). (c) The simulated target process, $\mathbf{y}$ (a noisy linear function of $\mathbf{\tilde{z}_{\texttt{trev}}}$, as defined in Eq. \ref{['eqn:driving_target']}). (d)--(g) A non-stationary third-order auto-regressive, AR(3), source process, with a time-varying lag-1 coefficient, $\phi_1$, Eq. \ref{['eqn:ar3']}. We plot example realizations of: (d) the $\boldsymbol{\phi_{1}}$ time series, generated by a piece-wise constant function [see \ref{['eqn:ar3_phi1']}], (e) the source time series, $\mathbf{x}$, (f) the driving feature that mediates the coupling to the target process, lag-1 autocorrelation $\mathbf{\tilde{z}_{{\texttt{AC1}}}}$ (computed over sliding windows of size duration $\tilde{l} = 100$ of $\mathbf{x}$), (g) the target time series, $\mathbf{y}$ [cf. Eq. \ref{['eqn:driving_target']}]. (h)--(l) A non-stationary bimodal spiking process, characterized by its spike rate, switches between two values over time [see \ref{['eqn:bimodal_spike_rate_switch', 'eqn:bimodal_spike_rate', 'eqn:bimodal_spike_x']} for how this process is defined mathematically and how the time-series samples are generated]. We plot example realizations of: (h) The spike rate time series $\mathbf{r}$ takes one of two values, $0.05$ or $0.1$, where the switching between the two spike rate values is governed by a Bernoulli distribution with probability $p$ [see Eqs. \ref{['eqn:bimodal_spike_rate_switch']} and \ref{['eqn:bimodal_spike_rate']}]. (i) The spike time series, $\mathbf{s}$, indicates the presence (1) or absence (0) of a spike at each time point, and is generated by sampling from a Bernoulli distribution with probability $\mathbf{r}$. (j) The source time series $\mathbf{x}$ is generated from $\mathbf{s}$ through a noisy rescaling of the spike time series $\mathbf{s}$, as given by \ref{['eqn:bimodal_spike_x']}. (k) We chose the driving feature as the number of spikes, $\mathbf{\tilde{z}_{\texttt{spikenum}}}$ (defined in as the number of times over a time-series window of length $\tilde{l}$ that the sample value exceeds a threshold of four), (l) The target process, $\mathbf{y}$, is a noisy linear function of $\mathbf{\tilde{z}_{\texttt{spikenum}}}$, as Eq. \ref{['eqn:driving_target']}.
  • Figure 4: $\mathrm{MI}_F$ can detect feature-mediated interactions from time-series data more efficiently than $\mathrm{MI}_s$ when the capturing feature set includes the driving feature, shown here for simulations using a noisy source process.(a) We simulated interactions mediated by each of 24 driving time-series features Lubba2019Catch22 and used $\mathrm{MI}_F$ with 24 capturing features on each simulation. Features with zero standard deviation (or very few distinct values) were removed from the analysis, resulting in the 18 features shown here. This matrix displays the capture rate for each combination of capturing feature (rows) and driving feature (columns) for a random noise process [Eq. \ref{['eqn:random_noise']}] for time series of length $T = 1000$, interaction timescale $\tilde{l} = 10$ samples, and noise strength $\beta = 0.2$. The time-series features are ordered by their similarity to each other, measured by the correlation of feature values across time series for each pair of features. Cells are colored by the capture rate of MI$_f$ for each combination of capturing and driving feature (darker colors indicate higher capture rates). (b)--(e) The capture rates for four selected features---autocorrelation timescale (acf_timescale), a time-reversibility metric (trev), and histogram mode (mode_5), and mean (mean)---are plotted as a function of time-series length $T$ (note the logarithmic scale for $T$). Each plot shows the variation in capture rates for a given driving feature: (b)acf_timescale, (c)mean, (d)mode_5, and (e)trev. Dashed dark green lines represent the capture rate of $\mathrm{MI}_F$ using the full set of 24 candidate features (see Sec. \ref{['subsec:simulation_studies']}), i.e., if any features among the 24 capturing features detects a statistical dependence (controlling the family-wise error at $\alpha = 0.05$) then it is considered detected by $\mathrm{MI}_F$. Dashed black lines represent the capture rate of $\mathrm{MI}_s$. Dashed grey lines represent the capture rate from using noise time series $\mathbf{z_\mathrm{null}}$ [defined in Eq. \ref{['eqn:znull']}]. Thick lines distinguish cases for which the capturing feature is the same as the driving feature.
  • Figure 5: $\mathrm{MI}_F$ is more robust to noise than $\mathrm{MI}_s$. We evaluated the performance of $\mathrm{MI}_F$ and $\mathrm{MI}_s$ on various levels of noise strength for a time series of length $T = 1000$ samples and interaction timescale $\tilde{l} = 100$ samples, for two processes: (a) an AR(3) process, \ref{['eqn:ar3']}, where the driving feature is the lag-1 autocorrelation; and (b) a bimodal spiking process, \ref{['eqn:bimodal_spike_x']}, where the driving feature is the number of spikes [defined in Sec. \ref{['subsec:generative_processes']}(c)]. Capture rates are plotted as a function of the noise strength, across a range $0.2 \leq \beta \leq 0.8$ for four selected capturing features. The pink lines illustrate the capture rate when the capturing feature is the driving feature. Dashed dark green lines represent the capture rate of $\mathrm{MI}_F$ using the set of 24 candidate features (see Sec. \ref{['subsec:simulation_studies']}). Dashed black lines represent the capture rate of $\mathrm{MI}_s$. Dashed gray lines represent the capture rate from using noise time series $\mathbf{z_\mathrm{null}}$.
  • ...and 2 more figures