Table of Contents
Fetching ...

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, Garrison Cottrell

TL;DR

The paper tackles time series forecasting with multiple exogenous drivers (NARX) and the challenge of modeling long-range dependencies. It introduces a dual-stage attention-based RNN (DA-RNN) with an input-attentive encoder and a temporally attentive decoder to select relevant driving series and time steps. Empirical results on the SML 2010 and NASDAQ 100 datasets show DA-RNN outperforms baselines and provides interpretability via attention weights, with robustness to noisy inputs. The approach offers accurate, explainable predictions and suggests broader applicability beyond forecasting.

Abstract

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

TL;DR

The paper tackles time series forecasting with multiple exogenous drivers (NARX) and the challenge of modeling long-range dependencies. It introduces a dual-stage attention-based RNN (DA-RNN) with an input-attentive encoder and a temporally attentive decoder to select relevant driving series and time steps. Empirical results on the SML 2010 and NASDAQ 100 datasets show DA-RNN outperforms baselines and provides interpretability via attention weights, with robustness to noisy inputs. The approach offers accurate, explainable predictions and suggests broader applicability beyond forecasting.

Abstract

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

Paper Structure

This paper contains 14 sections, 23 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Graphical illustration of the dual-stage attention-based recurrent neural network. (a) The input attention mechanism computes the attention weights $\alpha^k_{t}$ for multiple driving series $\{\textbf{x}^1, \textbf{x}^2, \cdots, \textbf{x}^n\}$ conditioned on the previous hidden state $\textbf{h}_{t-1}$ in the encoder and then feeds the newly computed $\tilde{\textbf{x}}_t = (\alpha^1_tx^1_t, \alpha^2_tx^2_t, \cdots, \alpha^n_tx^n_t )^{\mathrm{\top}}$ into the encoder LSTM unit. (b) The temporal attention system computes the attention weights $\beta^t_t$ based on the previous decoder hidden state $\textbf{d}_{t-1}$ and represents the input information as a weighted sum of the encoder hidden states across all the time steps. The generated context vector $\textbf{c}_{t}$ is then used as an input to the decoder LSTM unit. The output $\hat{y}_T$ of the last decoder LSTM unit is the predicted result.
  • Figure 2: NASDAQ 100 Index vs. Time. Encoder-Decoder (top) and Attention RNN (middle), are compared with DA-RNN (bottom).
  • Figure 3: Plot of the input attention weights for DA-RNN from a single encoder time step. The first 81 weights are on 81 original driving series and the last 81 weights are on 81 noisy driving series. (left) Input attention weights on NASDAQ100 training set. (right) Input attention weights on NASDAQ100 test set.
  • Figure 4: RMSE vs. length of time steps $T$ over SML 2010 (left) and NASDAQ 100 Stock (right).
  • Figure 5: RMSE vs. size of hidden states of encoder/decoder over SML 2010 (left) and NASDAQ 100 Stock (right).