Benchmarking Time Series Foundation Models for Short-Term Household Electricity Load Forecasting
Marcel Meyer, David Zapata, Sascha Kaltenpoth, Oliver Müller
TL;DR
This work tackles household short-term load forecasting by benchmarking zero-shot time series foundation models (TSFMs) against trained-from-scratch Transformers on four real-world datasets. Using a multi-dataset time-series cross-validation design and leakage controls, it compares TSFMs such as Chronos, TimesFM, Time-MoE, Sundial, LagLlama, and Moirai to baselines like PatchTST and TFT in an inference-only setting. Results indicate TSFMs are largely competitive with, and sometimes superior to, SOTA TFS transformers, particularly with longer input context, though performance is model-dependent (LagLlama, Moirai can lag in limited-context settings). The findings suggest zero-shot TSFMs can deliver accurate, scalable forecasts with reduced domain-specific training, pointing to promising directions in domain-focused pretraining and selective fine-tuning for energy-time-series forecasting.
Abstract
Accurate household electricity short-term load forecasting (STLF) is key to future and sustainable energy systems. While various studies have analyzed statistical, machine learning, or deep learning approaches for household electricity STLF, recently proposed time series foundation models such as Chronos, TimesFM or Time-MoE promise a new approach for household electricity STLF. These models are trained on a vast amount of time series data and are able to forecast time series without explicit task-specific training (zero-shot learning). In this study, we benchmark the forecasting capabilities of time series foundation models compared to Trained-from-Scratch (TFS) Transformer-based approaches. Our results suggest that foundation models perform comparably to TFS Transformer models, while certain time series foundation models outperform all TFS models when the input size increases. At the same time, they require less effort, as they need no domain-specific training and only limited contextual data for inference.
