Table of Contents
Fetching ...

A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

Qi Sima, Xinze Zhang, Yukun Bao, Siyue Yang, Liang Shen

TL;DR

This work tackles multi-step-ahead time series forecasting by addressing exposure bias in sequence-to-sequence models. It introduces PG-S2S, which combines an auxiliary model pool with a policy-gradient RL agent to adaptively select the decoder’s inputs at each step, thereby improving long-horizon accuracy and stability. Empirical results across six datasets show that PG-S2S outperforms standard training approaches and remains robust across different RNN units, demonstrating the practical value of learned input selection in S2S forecasting. The approach offers a principled framework to leverage diverse predictors and dynamic context for more reliable time-series predictions.

Abstract

Sequence-to-sequence architectures built upon recurrent neural networks have become a standard choice for multi-step-ahead time series prediction. In these models, the decoder produces future values conditioned on contextual inputs, typically either actual historical observations (ground truth) or previously generated predictions. During training, feeding ground-truth values helps stabilize learning but creates a mismatch between training and inference conditions, known as exposure bias, since such true values are inaccessible during real-world deployment. On the other hand, using the model's own outputs as inputs at test time often causes errors to compound rapidly across prediction steps. To mitigate these limitations, we introduce a new training paradigm grounded in reinforcement learning: a policy gradient-based method to learn an adaptive input selection strategy for sequence-to-sequence prediction models. Auxiliary models first synthesize plausible input candidates for the decoder, and a trainable policy network optimized via policy gradients dynamically chooses the most beneficial inputs to maximize long-term prediction performance. Empirical evaluations on diverse time series datasets confirm that our approach enhances both accuracy and stability in multi-step forecasting compared to conventional methods.

A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

TL;DR

This work tackles multi-step-ahead time series forecasting by addressing exposure bias in sequence-to-sequence models. It introduces PG-S2S, which combines an auxiliary model pool with a policy-gradient RL agent to adaptively select the decoder’s inputs at each step, thereby improving long-horizon accuracy and stability. Empirical results across six datasets show that PG-S2S outperforms standard training approaches and remains robust across different RNN units, demonstrating the practical value of learned input selection in S2S forecasting. The approach offers a principled framework to leverage diverse predictors and dynamic context for more reliable time-series predictions.

Abstract

Sequence-to-sequence architectures built upon recurrent neural networks have become a standard choice for multi-step-ahead time series prediction. In these models, the decoder produces future values conditioned on contextual inputs, typically either actual historical observations (ground truth) or previously generated predictions. During training, feeding ground-truth values helps stabilize learning but creates a mismatch between training and inference conditions, known as exposure bias, since such true values are inaccessible during real-world deployment. On the other hand, using the model's own outputs as inputs at test time often causes errors to compound rapidly across prediction steps. To mitigate these limitations, we introduce a new training paradigm grounded in reinforcement learning: a policy gradient-based method to learn an adaptive input selection strategy for sequence-to-sequence prediction models. Auxiliary models first synthesize plausible input candidates for the decoder, and a trainable policy network optimized via policy gradients dynamically chooses the most beneficial inputs to maximize long-term prediction performance. Empirical evaluations on diverse time series datasets confirm that our approach enhances both accuracy and stability in multi-step forecasting compared to conventional methods.
Paper Structure (17 sections, 19 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 19 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Basic framework of S2S model with policy gradient.
  • Figure 2: Illustration of asynchronous training.
  • Figure 3: Training process of agent on different prediction tasks.
  • Figure 4: ETTh2-H24: The percentage of models in the training and validation datasets that are selected by the agent.