A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction
Qi Sima, Xinze Zhang, Yukun Bao, Siyue Yang, Liang Shen
TL;DR
This work tackles multi-step-ahead time series forecasting by addressing exposure bias in sequence-to-sequence models. It introduces PG-S2S, which combines an auxiliary model pool with a policy-gradient RL agent to adaptively select the decoder’s inputs at each step, thereby improving long-horizon accuracy and stability. Empirical results across six datasets show that PG-S2S outperforms standard training approaches and remains robust across different RNN units, demonstrating the practical value of learned input selection in S2S forecasting. The approach offers a principled framework to leverage diverse predictors and dynamic context for more reliable time-series predictions.
Abstract
Sequence-to-sequence architectures built upon recurrent neural networks have become a standard choice for multi-step-ahead time series prediction. In these models, the decoder produces future values conditioned on contextual inputs, typically either actual historical observations (ground truth) or previously generated predictions. During training, feeding ground-truth values helps stabilize learning but creates a mismatch between training and inference conditions, known as exposure bias, since such true values are inaccessible during real-world deployment. On the other hand, using the model's own outputs as inputs at test time often causes errors to compound rapidly across prediction steps. To mitigate these limitations, we introduce a new training paradigm grounded in reinforcement learning: a policy gradient-based method to learn an adaptive input selection strategy for sequence-to-sequence prediction models. Auxiliary models first synthesize plausible input candidates for the decoder, and a trainable policy network optimized via policy gradients dynamically chooses the most beneficial inputs to maximize long-term prediction performance. Empirical evaluations on diverse time series datasets confirm that our approach enhances both accuracy and stability in multi-step forecasting compared to conventional methods.
