Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition

Yuqi Jiang; Yan Li; Yize Chen

Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition

Yuqi Jiang, Yan Li, Yize Chen

TL;DR

The paper tackles the interpretability gap in short-term load forecasting by introducing a post-hoc global interpretability framework atop a multi-scale temporal decomposition. The method learns a linear combination of neural networks that attend to input features, while decomposing the time series into multiple trend and residual components across scales; auxiliary features are processed with LSTMs and combined to enable feature interpretability. A hybrid forecasting module using Transformers for low-frequency trend and CNNs for high-frequency residuals provides accurate predictions, with heatmaps revealing temporal and feature significance. Evaluations on Belgian and Australian datasets show superior accuracy (e.g., MSE $0.52$, MAE $0.57$, RMSE $0.72$ on standardized data) and clear interpretability, suggesting practical applicability for grid operators and insights into the drivers of load patterns.

Abstract

Rapid progress in machine learning and deep learning has enabled a wide range of applications in the electricity load forecasting of power systems, for instance, univariate and multivariate short-term load forecasting. Though the strong capabilities of learning the non-linearity of the load patterns and the high prediction accuracy have been achieved, the interpretability of typical deep learning models for electricity load forecasting is less studied. This paper proposes an interpretable deep learning method, which learns a linear combination of neural networks that each attends to an input time feature. We also proposed a multi-scale time series decomposition method to deal with the complex time patterns. Case studies have been carried out on the Belgium central grid load dataset and the proposed model demonstrated better accuracy compared to the frequently applied baseline model. Specifically, the proposed multi-scale temporal decomposition achieves the best MSE, MAE and RMSE of 0.52, 0.57 and 0.72 respectively. As for interpretability, on one hand, the proposed method displays generalization capability. On the other hand, it can demonstrate not only the feature but also the temporal interpretability compared to other baseline methods. Besides, the global time feature interpretabilities are also obtained. Obtaining global feature interpretabilities allows us to catch the overall patterns, trends, and cyclicality in load data while also revealing the significance of various time-related features in forming the final outputs.

Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition

TL;DR

, MAE

, RMSE

on standardized data) and clear interpretability, suggesting practical applicability for grid operators and insights into the drivers of load patterns.

Abstract

Paper Structure (15 sections, 12 equations, 6 figures, 5 tables)

This paper contains 15 sections, 12 equations, 6 figures, 5 tables.

Introduction
Task Description and Model Framework
Datasets And Forecasting Tasks
Interpretability and Model Overview
Multi-Scale Temporal Decomposition
Decomposition Kernel Settings
Auxiliary Feature Process
Model Forecasting
Complexity Analysis
Result Evaluations and Discussions
Baseline Experiments
Results discussion
Analysis of Model Parameters and Interpretability
Generalization Study
Conclusion

Figures (6)

Figure 1: Overall framework of the proposed model. (a) illustrates the overall structure of the proposed method, where the temporal data decomposition and auxiliary feature representation are carried out separately. In (b), it illustrate the auxiliary feature representation. For each representation, the model adopt an LSTM and then a linear layer is utilized to learn their combinations. In (c), we show the utilized CNN structure in the proposed model. In (d), a diagram of the temporal decomposition is illustrated.
Figure 2: The current load series at a specific time point and its lagged series. As can be derived from the cosine similarity heatmap, these lagged series illustrate high similarities with the current series.
Figure 3: Temporal dependencies calculated in \ref{['equ:similarity']} for the Belgium load data.
Figure 4: Hourly load forecasting visualization
Figure 5: (a) The auxiliary feature significance scores of the Belgium dataset. (b) and (c) The feature significance score of the test datasets $\text{Test}_1$ and $\text{Test}_2$.
...and 1 more figures

Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition

TL;DR

Abstract

Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition

Authors

TL;DR

Abstract

Table of Contents

Figures (6)