Integrating the Expected Future in Load Forecasts with Contextually Enhanced Transformer Models
Raffael Theiler, Leandro Von Krannichfeldt, Giovanni Sansavini, Michael F. Howland, Olga Fink
TL;DR
This work reframes load forecasting as a dual forecasting-regression task and demonstrates that integrating complete expected future context with historical data via contextually enhanced transformer models substantially improves accuracy. Through two case studies—railway traction networks and building energy occupancy—the authors show large reductions in forecasting errors (up to 56.3% in MAE) and markedly fewer outliers when using future contextual information. The approach remains robust across architectures, though embedding strategies and the degree of benefit from future context vary by model; non-causal attention and separate past/future embeddings are central to performance gains. Overall, the framework offers a scalable, generalizable path to more reliable energy forecasts in decentralized grids and could enable meaningful cost savings and better grid stability.
Abstract
Accurate and reliable energy forecasting is essential for power grid operators who strive to minimize extreme forecasting errors that pose significant operational challenges and incur high intra-day trading costs. Incorporating planning information -- such as anticipated user behavior, scheduled events or timetables -- provides substantial contextual information to enhance forecast accuracy and reduce the occurrence of large forecasting errors. Existing approaches, however, lack the flexibility to effectively integrate both dynamic, forward-looking contextual inputs and historical data. In this work, we conceptualize forecasting as a combined forecasting-regression task, formulated as a sequence-to-sequence prediction problem, and introduce contextually-enhanced transformer models designed to leverage all contextual information effectively. We demonstrate the effectiveness of our approach through a primary case study on nationwide railway energy consumption forecasting, where integrating contextual information into transformer models, particularly timetable data, resulted in a significant average mean absolute error reduction of 26.6%. An auxiliary case study on building energy forecasting, leveraging planned office occupancy data, further illustrates the generalizability of our method, showing an average reduction of 56.3% in mean absolute error. Compared to other state-of-the-art methods, our approach consistently outperforms existing models, underscoring the value of context-aware deep learning techniques in energy forecasting applications.
