A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

Qi Sima; Xinze Zhang; Yukun Bao; Siyue Yang; Liang Shen

A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

Qi Sima, Xinze Zhang, Yukun Bao, Siyue Yang, Liang Shen

TL;DR

This work tackles multi-step-ahead time series forecasting by addressing exposure bias in sequence-to-sequence models. It introduces PG-S2S, which combines an auxiliary model pool with a policy-gradient RL agent to adaptively select the decoder’s inputs at each step, thereby improving long-horizon accuracy and stability. Empirical results across six datasets show that PG-S2S outperforms standard training approaches and remains robust across different RNN units, demonstrating the practical value of learned input selection in S2S forecasting. The approach offers a principled framework to leverage diverse predictors and dynamic context for more reliable time-series predictions.

Abstract

Sequence-to-sequence architectures built upon recurrent neural networks have become a standard choice for multi-step-ahead time series prediction. In these models, the decoder produces future values conditioned on contextual inputs, typically either actual historical observations (ground truth) or previously generated predictions. During training, feeding ground-truth values helps stabilize learning but creates a mismatch between training and inference conditions, known as exposure bias, since such true values are inaccessible during real-world deployment. On the other hand, using the model's own outputs as inputs at test time often causes errors to compound rapidly across prediction steps. To mitigate these limitations, we introduce a new training paradigm grounded in reinforcement learning: a policy gradient-based method to learn an adaptive input selection strategy for sequence-to-sequence prediction models. Auxiliary models first synthesize plausible input candidates for the decoder, and a trainable policy network optimized via policy gradients dynamically chooses the most beneficial inputs to maximize long-term prediction performance. Empirical evaluations on diverse time series datasets confirm that our approach enhances both accuracy and stability in multi-step forecasting compared to conventional methods.

A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

TL;DR

Abstract

Paper Structure (17 sections, 19 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 19 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Background Study
Multi-Step-Ahead Time Series Prediction
Reinforcement Learning
Methodology
MDP Setting for Input Selection of the Decoder
The PG-based S2S Prediction Model
Training and Prediction
Experiments
Datasets Description
Accuracy Measure
Experimental Setup
Results and Discussion
Comparison on Prediction performance
Analysis of the Effectiveness of Reinforcement Learning
...and 2 more sections

Figures (4)

Figure 1: Basic framework of S2S model with policy gradient.
Figure 2: Illustration of asynchronous training.
Figure 3: Training process of agent on different prediction tasks.
Figure 4: ETTh2-H24: The percentage of models in the training and validation datasets that are selected by the agent.

A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

TL;DR

Abstract

A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)