Table of Contents
Fetching ...

R$^2$Energy: A Large-Scale Benchmark for Robust Renewable Energy Forecasting under Diverse and Extreme Conditions

Zhi Sheng, Yuan Yuan, Guozhen Zhang, Yong Li

TL;DR

R$^2$Energy tackles robustness of renewable energy forecasting under diverse and extreme weather by introducing a large-scale, leakage-free, NWP-assisted benchmark. It combines 902 wind/solar stations across four Chinese regions with expert-annotated extreme events to stress-test models, enabling regime-aware evaluation beyond average accuracy. An empirical study across 16 models reveals a robustness–complexity trade-off: autoregressive GRU/RNN models with per-step NWP injection demonstrate stable, accurate forecasts, while Transformer-based architectures exhibit instability under high-entropy conditions. The benchmark provides a principled platform for developing physically-augmented AI and evaluating industrially relevant metrics such as the Industrial Qualification Rate ($Q$) and Forecast Skill Score ($S$), guiding deployment for safety-critical grid operations.

Abstract

The rapid expansion of renewable energy, particularly wind and solar power, has made reliable forecasting critical for power system operations. While recent deep learning models have achieved strong average accuracy, the increasing frequency and intensity of climate-driven extreme weather events pose severe threats to grid stability and operational security. Consequently, developing robust forecasting models that can withstand volatile conditions has become a paramount challenge. In this paper, we present R$^2$Energy, a large-scale benchmark for NWP-assisted renewable energy forecasting. It comprises over 10.7 million high-fidelity hourly records from 902 wind and solar stations across four provinces in China, providing the diverse meteorological conditions necessary to capture the wide-ranging variability of renewable generation. We further establish a standardized, leakage-free forecasting paradigm that grants all models identical access to future Numerical Weather Prediction (NWP) signals, enabling fair and reproducible comparison across state-of-the-art representative forecasting architectures. Beyond aggregate accuracy, we incorporate regime-wise evaluation with expert-aligned extreme weather annotations, uncovering a critical ``robustness gap'' typically obscured by average metrics. This gap reveals a stark robustness-complexity trade-off: under extreme conditions, a model's reliability is driven by its meteorological integration strategy rather than its architectural complexity. R$^2$Energy provides a principled foundation for evaluating and developing forecasting models for safety-critical power system applications.

R$^2$Energy: A Large-Scale Benchmark for Robust Renewable Energy Forecasting under Diverse and Extreme Conditions

TL;DR

REnergy tackles robustness of renewable energy forecasting under diverse and extreme weather by introducing a large-scale, leakage-free, NWP-assisted benchmark. It combines 902 wind/solar stations across four Chinese regions with expert-annotated extreme events to stress-test models, enabling regime-aware evaluation beyond average accuracy. An empirical study across 16 models reveals a robustness–complexity trade-off: autoregressive GRU/RNN models with per-step NWP injection demonstrate stable, accurate forecasts, while Transformer-based architectures exhibit instability under high-entropy conditions. The benchmark provides a principled platform for developing physically-augmented AI and evaluating industrially relevant metrics such as the Industrial Qualification Rate () and Forecast Skill Score (), guiding deployment for safety-critical grid operations.

Abstract

The rapid expansion of renewable energy, particularly wind and solar power, has made reliable forecasting critical for power system operations. While recent deep learning models have achieved strong average accuracy, the increasing frequency and intensity of climate-driven extreme weather events pose severe threats to grid stability and operational security. Consequently, developing robust forecasting models that can withstand volatile conditions has become a paramount challenge. In this paper, we present REnergy, a large-scale benchmark for NWP-assisted renewable energy forecasting. It comprises over 10.7 million high-fidelity hourly records from 902 wind and solar stations across four provinces in China, providing the diverse meteorological conditions necessary to capture the wide-ranging variability of renewable generation. We further establish a standardized, leakage-free forecasting paradigm that grants all models identical access to future Numerical Weather Prediction (NWP) signals, enabling fair and reproducible comparison across state-of-the-art representative forecasting architectures. Beyond aggregate accuracy, we incorporate regime-wise evaluation with expert-aligned extreme weather annotations, uncovering a critical ``robustness gap'' typically obscured by average metrics. This gap reveals a stark robustness-complexity trade-off: under extreme conditions, a model's reliability is driven by its meteorological integration strategy rather than its architectural complexity. REnergy provides a principled foundation for evaluating and developing forecasting models for safety-critical power system applications.
Paper Structure (37 sections, 7 equations, 5 figures, 6 tables)

This paper contains 37 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overall framework of R$^2$Energy.
  • Figure 2: Intra-day and seasonal variability across the wind and solar datasets. The top row (a) illustrates the average wind speed (m/s) variations between daytime (07:00–18:00, solid bars) and nighttime (dotted bars). The bottom row (b) compares solar radiation (W/m²) during peak midday hours (11:00–14:00, solid bars) versus shoulder hours (morning/evening, dotted bars). The numerical labels above the brackets denote the difference between the two periods, while the global mean and standard deviation are listed for each dataset. Missing bars in W3 indicate data unavailability for specific seasons.
  • Figure 3: Impact of extreme weather on renewable power generation. (a) Solar (S2) and (b) Wind (W2) datasets. Shaded areas in (a) denote 95% confidence intervals; values in (b) indicate medians.
  • Figure 4: Power curve of the GE 2.5MW wind turbine. The curve illustrates the non-linear relationship between wind speed and power generation, highlighting three operational regions: cut-in ($v_{in} \approx 3$ m/s), rated ($v_{rated} \approx 12.5$ m/s), and cut-out ($v_{out} = 25$ m/s).
  • Figure 5: Efficiency-Accuracy Landscape (Solar S1, H=24). The scatter plot illustrates the trade-off between computational cost (Training Time per Epoch, Log Scale) and predictive performance (MAE). The dashed line represents the physical Baseline error (0.1272).