A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

Christopher Salazar; Ashis G. Banerjee

A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

Christopher Salazar, Ashis G. Banerjee

TL;DR

Addresses why RNNs' forecasting performance varies across time-series by introducing a distance-correlation framework that tracks information flow through RNN activation layers via $\hat{R}(\mathbf{A}_{t}^{(p)}, \mathbf{Y})$ and related metrics. The method reveals that activation layers can learn lag structures but gradually forget information over roughly 5–6 layers, limiting effectiveness on high-lag series, and that MA and GARCH processes are poorly captured. Heatmaps of $\hat{R}$ across hyperparameters show that input window size $T$ often dominates other choices in forecasting accuracy, enabling practitioners to pre-assess RNN suitability for a given time series. This framework provides a practical, training-free tool to diagnose and guide RNN design for univariate time-series forecasting.

Abstract

Time series forecasting has received a lot of attention, with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Previous studies on RNN time series forecasting, however, show inconsistent outcomes and offer few explanations for performance variations among the datasets. In this paper, we provide an approach to link time series characteristics with RNN components via the versatile metric of distance correlation. This metric allows us to examine the information flow through the RNN activation layers to be able to interpret and explain their performance. We empirically show that the RNN activation layers learn the lag structures of time series well. However, they gradually lose this information over the span of a few consecutive layers, thereby worsening the forecast quality for series with large lag structures. We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes. Last, we generate heatmaps for visual comparisons of the activation layers for different choices of the network hyperparameters to identify which of them affect the forecast performance. Our findings can, therefore, aid practitioners in assessing the effectiveness of RNNs for given time series data without actually training and evaluating the networks.

A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

TL;DR

Addresses why RNNs' forecasting performance varies across time-series by introducing a distance-correlation framework that tracks information flow through RNN activation layers via

and related metrics. The method reveals that activation layers can learn lag structures but gradually forget information over roughly 5–6 layers, limiting effectiveness on high-lag series, and that MA and GARCH processes are poorly captured. Heatmaps of

across hyperparameters show that input window size

often dominates other choices in forecasting accuracy, enabling practitioners to pre-assess RNN suitability for a given time series. This framework provides a practical, training-free tool to diagnose and guide RNN design for univariate time-series forecasting.

Abstract

Paper Structure (17 sections, 12 equations, 15 figures, 4 tables)

This paper contains 17 sections, 12 equations, 15 figures, 4 tables.

Introduction
Related Works
Methodology
Time Series Forecasting
RNN Architecture
Distance Correlation
Analysis of RNN Activation Layers
Experiments
Time Series Data
RNN behavior under various time series processes
Implementation Details
Time Series Process Results
Temporal information loss in RNNs
Visualizing differences in time series RNN models
Discussion
...and 2 more sections

Figures (15)

Figure 1: Overview of the use of distance correlation to examine time series forecasting using a recurrent neural network (RNN). We begin with a time series history that comprises the inputs $x_{t}$ for the RNN and the predicted outputs $\hat{y}_{T}$. The outputs for each activation layer and ground truth values, $y_{T}$, are extracted and processed with distance correlation. This is then used to generate correlation plots for analyzing activation layers behaviors and visualizing heatmaps for comparisons of different RNN models.
Figure 2: An example of an auto-correlation plot for the sun spot time series data center_2009 with blue shaded significance level band. Lags that fall outside of significance level band are considered important to include in a forecasting model. It also provides a way to describe the time series data, as the cyclical nature of the lags from this plot indicates a seasonal characteristic to sun spot observations.
Figure 3: Time series sampling strategy where a full univariate time series plot is displayed. This full time series is partitioned into an 80:20 training:test split. Each of these training and testing splits are further divided into input-output samples, where each sample is generated via a sliding window. In this example, our sliding window is of size $T = 5$ and prediction horizon $H = 1$.
Figure 4: Time series input-output structure with an unfolded RNN for the $i$th sample and training epoch $p$. The $i$th input of the RNN $\textbf{x}_{i} = [x_{1,i}, x_{2,i}, ... , x_{T,i}]$ produces activation layer outputs $\textbf{a}_{t,i}^{(p)}$ for $t = 0, \dots, T$ using a predetermined activation function $f$. The final activation layer output $\textbf{a}_{T,i}^{(p)}$ is fed into a dense layer to produce a single-step ahead forecast value $\hat{\textbf{y}}_{i} = \{\hat{x}_{T+H,i}\}$ for $H = 1$.
Figure 5: AR time series plots with RNN forecasts of the test set (top row). The mean values of ACF and the distance correlations between the outputs of activation layers and the ground truth horizon values are shown in the bar plots for 50 runs (bottom row). For the AR(1) process, we observe high correlation values for both metrics and a gradual increase as the layer number increases. For the AR(5) process, we see layers 1, 6, 11, and 16 with high correlations, but they cyclically diminish for every 5 layers. This aligns with high ACF values whose corresponding lags occur at 20, 15, 10, and 5. Both the de-standardized time series plots have well fitting forecasts from the RNN, which corresponds to the relatively low MSE and MAPE scores.
...and 10 more figures

A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

TL;DR

Abstract

A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (15)