Integrating the Expected Future in Load Forecasts with Contextually Enhanced Transformer Models

Raffael Theiler; Leandro Von Krannichfeldt; Giovanni Sansavini; Michael F. Howland; Olga Fink

Integrating the Expected Future in Load Forecasts with Contextually Enhanced Transformer Models

Raffael Theiler, Leandro Von Krannichfeldt, Giovanni Sansavini, Michael F. Howland, Olga Fink

TL;DR

This work reframes load forecasting as a dual forecasting-regression task and demonstrates that integrating complete expected future context with historical data via contextually enhanced transformer models substantially improves accuracy. Through two case studies—railway traction networks and building energy occupancy—the authors show large reductions in forecasting errors (up to 56.3% in MAE) and markedly fewer outliers when using future contextual information. The approach remains robust across architectures, though embedding strategies and the degree of benefit from future context vary by model; non-causal attention and separate past/future embeddings are central to performance gains. Overall, the framework offers a scalable, generalizable path to more reliable energy forecasts in decentralized grids and could enable meaningful cost savings and better grid stability.

Abstract

Accurate and reliable energy forecasting is essential for power grid operators who strive to minimize extreme forecasting errors that pose significant operational challenges and incur high intra-day trading costs. Incorporating planning information -- such as anticipated user behavior, scheduled events or timetables -- provides substantial contextual information to enhance forecast accuracy and reduce the occurrence of large forecasting errors. Existing approaches, however, lack the flexibility to effectively integrate both dynamic, forward-looking contextual inputs and historical data. In this work, we conceptualize forecasting as a combined forecasting-regression task, formulated as a sequence-to-sequence prediction problem, and introduce contextually-enhanced transformer models designed to leverage all contextual information effectively. We demonstrate the effectiveness of our approach through a primary case study on nationwide railway energy consumption forecasting, where integrating contextual information into transformer models, particularly timetable data, resulted in a significant average mean absolute error reduction of 26.6%. An auxiliary case study on building energy forecasting, leveraging planned office occupancy data, further illustrates the generalizability of our method, showing an average reduction of 56.3% in mean absolute error. Compared to other state-of-the-art methods, our approach consistently outperforms existing models, underscoring the value of context-aware deep learning techniques in energy forecasting applications.

Integrating the Expected Future in Load Forecasts with Contextually Enhanced Transformer Models

TL;DR

Abstract

Paper Structure (18 sections, 9 equations, 18 figures, 8 tables)

This paper contains 18 sections, 9 equations, 18 figures, 8 tables.

Introduction
Integration of the Expected Future across Electrical Energy Domains
Primary Case Study: Forecasting Dynamics in Railway Traction Networks
Auxiliary Case Study: Forecasting Dynamics in Building Energy Systems
Comparative Evaluation of Contextually Enhanced Transformers Across Architectures and Models
Discussion
Methods
Efficient integration of the expected future in Forecasting
Problem Formulation
Transformer for the timeseries regression task
Contextual Embedding
Training Objective
Datasets
Model Training and Evaluation Criteria
Baseline Models
...and 3 more sections

Figures (18)

Figure 1: Illustration of the proposed load forecasting framework with contextually enhanced transformer models, highlighting the case studies focused on the Swiss national railway traction network (Railway and Railway-Agg dataset) and load forecasting for buildings (Building Energy dataset) in Panel a. Panel b displays the collection of "expected future" data, including future occupancy information from building management, numeric weather forecasts, timetables, schedules and gross ton-kilometers (GTKM) estimates derived from the operational planning of the railway operator. Traditionally, methods such as pure timeseries forecasting c.1 and regression models c.2 are employed for load forecasting. Our proposed approach introduces the use of transformer architectures to learn a unified representation of the time series regression task (d). To efficiently integrate both past and future information for this task, we propose dividing the input data at the current time point $t$ (the present) and to tokenize the segments individually (c). We then apply distinct embedding strategies for past data (d.2) and future contextual information (d.1) in our contextually enhanced transformers.
Figure 2: Normalized Mean Absolute Error (NMAE) in normalized megawatts with and without the addition of FCI on the Railway and Building Energy dataset. We list all contextually enhanced transformer models: enhanced Crossformer (CF), enhanced Spacetimeformer (STF) and enhanced Timeseries Transformer (TST), PatchTST (PTST), and multi-step linear models (DLinear and TiDE) included in our evaluations.
Figure 3: Comparison of the robustness of contextually enhanced transformer models: Crossformer (CF), Spacetimeformer (STF) and Timeseries Transformer (TST) trained and evaluated on the Railway dataset in a) and on the Building Energy dataset in b). The linear regression model (EUB), currently the best performing model in production at the data supplier, is also included for comparison in a). Error bands illustrate the variation across different training initializations.
Figure 4: Model Performance Case Study: Swiss National Holiday (August 1, 2023) This detailed study focuses on the Swiss National Holiday event in the Railway dataset. For the individual contextually enhanced transformer Crossformer (CF), Spacetimeformer (STF) and Timeseries Transformer (TST) we show scatter plots relating forecasted values to ground truth for the entire test set. In the first row we show the models without planning information (-PLAN), in the second row, we include planning information. We overlay the 24 time steps of August 1 in black. Below, we present the predicted load curves for forecasts made with and without future contextual information. To highlight the impact of different data sources, we separately examine planning data and weather data in the forecast plot, illustrating the substantial benefits of integrating planning data alongside weather data. Error bands are included to represent the variability across multiple training runs.
Figure 5: Model Performance case study for the Building Energy dataset. We overlay the building energy profile with the day-ahead forecasts (48 time steps) of contextually enhanced transformer models (Crossformer (CF), Spacetimeformer (STF) and Timeseries Transformer (TST)). We plot forecasts with and without future contextual information(-FCI). To highlight the impact of different future context sources, we separately display the impact of removing occupancy data (-OCC) in the forecast plot. Error bands show variation across training runs.
...and 13 more figures

Integrating the Expected Future in Load Forecasts with Contextually Enhanced Transformer Models

TL;DR

Abstract

Integrating the Expected Future in Load Forecasts with Contextually Enhanced Transformer Models

Authors

TL;DR

Abstract

Table of Contents

Figures (18)