A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting
Christopher Salazar, Ashis G. Banerjee
TL;DR
Addresses why RNNs' forecasting performance varies across time-series by introducing a distance-correlation framework that tracks information flow through RNN activation layers via $\hat{R}(\mathbf{A}_{t}^{(p)}, \mathbf{Y})$ and related metrics. The method reveals that activation layers can learn lag structures but gradually forget information over roughly 5–6 layers, limiting effectiveness on high-lag series, and that MA and GARCH processes are poorly captured. Heatmaps of $\hat{R}$ across hyperparameters show that input window size $T$ often dominates other choices in forecasting accuracy, enabling practitioners to pre-assess RNN suitability for a given time series. This framework provides a practical, training-free tool to diagnose and guide RNN design for univariate time-series forecasting.
Abstract
Time series forecasting has received a lot of attention, with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Previous studies on RNN time series forecasting, however, show inconsistent outcomes and offer few explanations for performance variations among the datasets. In this paper, we provide an approach to link time series characteristics with RNN components via the versatile metric of distance correlation. This metric allows us to examine the information flow through the RNN activation layers to be able to interpret and explain their performance. We empirically show that the RNN activation layers learn the lag structures of time series well. However, they gradually lose this information over the span of a few consecutive layers, thereby worsening the forecast quality for series with large lag structures. We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes. Last, we generate heatmaps for visual comparisons of the activation layers for different choices of the network hyperparameters to identify which of them affect the forecast performance. Our findings can, therefore, aid practitioners in assessing the effectiveness of RNNs for given time series data without actually training and evaluating the networks.
