R$^2$Energy: A Large-Scale Benchmark for Robust Renewable Energy Forecasting under Diverse and Extreme Conditions
Zhi Sheng, Yuan Yuan, Guozhen Zhang, Yong Li
TL;DR
R$^2$Energy tackles robustness of renewable energy forecasting under diverse and extreme weather by introducing a large-scale, leakage-free, NWP-assisted benchmark. It combines 902 wind/solar stations across four Chinese regions with expert-annotated extreme events to stress-test models, enabling regime-aware evaluation beyond average accuracy. An empirical study across 16 models reveals a robustness–complexity trade-off: autoregressive GRU/RNN models with per-step NWP injection demonstrate stable, accurate forecasts, while Transformer-based architectures exhibit instability under high-entropy conditions. The benchmark provides a principled platform for developing physically-augmented AI and evaluating industrially relevant metrics such as the Industrial Qualification Rate ($Q$) and Forecast Skill Score ($S$), guiding deployment for safety-critical grid operations.
Abstract
The rapid expansion of renewable energy, particularly wind and solar power, has made reliable forecasting critical for power system operations. While recent deep learning models have achieved strong average accuracy, the increasing frequency and intensity of climate-driven extreme weather events pose severe threats to grid stability and operational security. Consequently, developing robust forecasting models that can withstand volatile conditions has become a paramount challenge. In this paper, we present R$^2$Energy, a large-scale benchmark for NWP-assisted renewable energy forecasting. It comprises over 10.7 million high-fidelity hourly records from 902 wind and solar stations across four provinces in China, providing the diverse meteorological conditions necessary to capture the wide-ranging variability of renewable generation. We further establish a standardized, leakage-free forecasting paradigm that grants all models identical access to future Numerical Weather Prediction (NWP) signals, enabling fair and reproducible comparison across state-of-the-art representative forecasting architectures. Beyond aggregate accuracy, we incorporate regime-wise evaluation with expert-aligned extreme weather annotations, uncovering a critical ``robustness gap'' typically obscured by average metrics. This gap reveals a stark robustness-complexity trade-off: under extreme conditions, a model's reliability is driven by its meteorological integration strategy rather than its architectural complexity. R$^2$Energy provides a principled foundation for evaluating and developing forecasting models for safety-critical power system applications.
