Loss Shaping Constraints for Long-Term Time Series Forecasting
Ignacio Hounie, Javier Porras-Valenzuela, Alejandro Ribeiro
TL;DR
The paper tackles the issue that multi-step time series forecasting methods often optimize average performance across a forecast window, which can produce uneven per-step errors. It introduces loss shaping constraints that enforce per-step upper bounds on the expected loss, and augments them with resilient relaxation to ensure feasibility during training. A Primal-Dual algorithm is developed to solve the constrained and relaxed problems, with empirical duality guarantees under certain conditions. Experiments on transformer-based forecasters across multiple datasets show that constraining per-step losses shapes the error distribution while maintaining competitive mean performance, and resilience improves feasibility and generalization in many settings.
Abstract
Several applications in time series forecasting require predicting multiple steps ahead. Despite the vast amount of literature in the topic, both classical and recent deep learning based approaches have mostly focused on minimising performance averaged over the predicted window. We observe that this can lead to disparate distributions of errors across forecasting steps, especially for recent transformer architectures trained on popular forecasting benchmarks. That is, optimising performance on average can lead to undesirably large errors at specific time-steps. In this work, we present a Constrained Learning approach for long-term time series forecasting that aims to find the best model in terms of average performance that respects a user-defined upper bound on the loss at each time-step. We call our approach loss shaping constraints because it imposes constraints on the loss at each time step, and leverage recent duality results to show that despite its non-convexity, the resulting problem has a bounded duality gap. We propose a practical Primal-Dual algorithm to tackle it, and demonstrate that the proposed approach exhibits competitive average performance in time series forecasting benchmarks, while shaping the distribution of errors across the predicted window.
