Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France
Eloi Lindas, Yannig Goude, Philippe Ciais
TL;DR
This work tackles country-scale forecasting of renewable energy in France by building spatially explicit datasets that fuse weather data with capacity and location information for solar and wind power. It benchmarks three modeling paradigms—spatially averaged inputs, dimension-reduced inputs, and CNN-based image inputs—while rigorously evaluating cross-validation and hyperparameter optimization tailored for time-series data. Key findings show that preserving temporal structure in cross-validation yields more reliable generalization estimates, and that neural-network approaches, especially CNNs operating on power-weighted weather maps, effectively capture spatial patterns and outperform traditional tree-based models in extrapolating to expanding capacity scenarios. The study reports midterm horizon errors around 4–10% nRMSE and demonstrates the practicality of the approach for regional power supply forecasting, complemented by open access datasets for future benchmarking and method development.
Abstract
Accurate prediction of non-dispatchable renewable energy sources is essential for grid stability and price prediction. Regional power supply forecasts are usually indirect through a bottom-up approach of plant-level forecasts, incorporate lagged power values, and do not use the potential of spatially resolved data. This study presents a comprehensive methodology for predicting solar and wind power production at country scale in France using machine learning models trained with spatially explicit weather data combined with spatial information about production sites capacity. A dataset is built spanning from 2012 to 2023, using daily power production data from RTE (the national grid operator) as the target variable, with daily weather data from ERA5, production sites capacity and location, and electricity prices as input features. Three modeling approaches are explored to handle spatially resolved weather data: spatial averaging over the country, dimension reduction through principal component analysis, and a computer vision architecture to exploit complex spatial relationships. The study benchmarks state-of-the-art machine learning models as well as hyperparameter tuning approaches based on cross-validation methods on daily power production data. Results indicate that cross-validation tailored to time series is best suited to reach low error. We found that neural networks tend to outperform traditional tree-based models, which face challenges in extrapolation due to the increasing renewable capacity over time. Model performance ranges from 4% to 10% in nRMSE for midterm horizon, achieving similar error metrics to local models established at a single-plant level, highlighting the potential of these methods for regional power supply forecasting.
