Context Neural Networks: A Scalable Multivariate Model for Time Series Forecasting
Abishek Sriramulu, Christoph Bergmeir, Slawek Smyl
TL;DR
This paper tackles the challenge of forecasting many related time series by leveraging contextual information from neighboring series without incurring the quadratic costs of attention or graph-based methods. It introduces ContextRNN, a dual-track architecture with a context-tracking branch and a main forecasting branch, augmented by Context Convolution and per-series Context Modifiers, and powered by a lightweight wdRNN core. The model uses adaptive Exponential Smoothing, frequency-domain contextual features, and a pinball loss to produce both point forecasts and predictive intervals, achieving state-of-the-art accuracy with linear-scale complexity. Empirically, ContextRNN delivers substantial improvements on large, real-world datasets while reducing computational demands, making scalable, context-aware multivariate forecasting practical for industrial applications.
Abstract
Real-world time series often exhibit complex interdependencies that cannot be captured in isolation. Global models that model past data from multiple related time series globally while producing series-specific forecasts locally are now common. However, their forecasts for each individual series remain isolated, failing to account for the current state of its neighbouring series. Multivariate models like multivariate attention and graph neural networks can explicitly incorporate inter-series information, thus addressing the shortcomings of global models. However, these techniques exhibit quadratic complexity per timestep, limiting scalability. This paper introduces the Context Neural Network, an efficient linear complexity approach for augmenting time series models with relevant contextual insights from neighbouring time series without significant computational overhead. The proposed method enriches predictive models by providing the target series with real-time information from its neighbours, addressing the limitations of global models, yet remaining computationally tractable for large datasets.
