Addressing Prediction Delays in Time Series Forecasting: A Continuous GRU Approach with Derivative Regularization
Sheo Yon Jhin, Seojin Kim, Noseong Park
TL;DR
The paper tackles prediction delay in time-series forecasting by moving beyond MSE-focused training to explicit time-derivative supervision. It introduces CONTIME, a continuous-time bi-directional GRU grounded in Neural ODEs that leverages a time-derivative loss $L_{\Delta t}$ and Hermite spline interpolation to produce timely, accurate forecasts. Across six diverse datasets, CONTIME demonstrates superior performance not only in MSE but also in DTW (shape) and TDI (timing), while mitigating delay in practical scenarios like stock movement and weather predictions. The work provides a practical, well-founded approach to real-time forecasting, with ablation studies and distribution-shift considerations reinforcing the robustness of derivative-based regularization for reducing prediction delays.
Abstract
Time series forecasting has been an essential field in many different application areas, including economic analysis, meteorology, and so forth. The majority of time series forecasting models are trained using the mean squared error (MSE). However, this training based on MSE causes a limitation known as prediction delay. The prediction delay, which implies the ground-truth precedes the prediction, can cause serious problems in a variety of fields, e.g., finance and weather forecasting -- as a matter of fact, predictions succeeding ground-truth observations are not practically meaningful although their MSEs can be low. This paper proposes a new perspective on traditional time series forecasting tasks and introduces a new solution to mitigate the prediction delay. We introduce a continuous-time gated recurrent unit (GRU) based on the neural ordinary differential equation (NODE) which can supervise explicit time-derivatives. We generalize the GRU architecture in a continuous-time manner and minimize the prediction delay through our time-derivative regularization. Our method outperforms in metrics such as MSE, Dynamic Time Warping (DTW) and Time Distortion Index (TDI). In addition, we demonstrate the low prediction delay of our method in a variety of datasets.
