GloSoFarID: Global multispectral dataset for Solar Farm IDentification in satellite imagery
Zhiyuan Yang, Ryan Rad
TL;DR
GloSoFarID tackles the challenge of globally mapping solar-farm expansion by introducing a global, multispectral satellite dataset with $13$ bands and $10$ m resolution, spanning $2021$-$2023$. The authors implement a three-stage construction pipeline—initial data assembly, SOTA model training, and ensemble-based new-data generation with rigorous quality control—to produce a high-quality benchmark dataset of $13{,}703$ samples ($256 \times 256$) across diverse regions. Benchmarking FCN, Half-UNet, and U-Net establishes baseline segmentation performance (IoU up to $79.3\%$ and F-score up to $87.8\%$) and demonstrates the dataset’s suitability for global solar-farm identification. Overall, GloSoFarID provides a timely, rich resource to drive machine learning-based monitoring of solar energy infrastructure and support sustainable energy planning.
Abstract
Solar Photovoltaic (PV) technology is increasingly recognized as a pivotal solution in the global pursuit of clean and renewable energy. This technology addresses the urgent need for sustainable energy alternatives by converting solar power into electricity without greenhouse gas emissions. It not only curtails global carbon emissions but also reduces reliance on finite, non-renewable energy sources. In this context, monitoring solar panel farms becomes essential for understanding and facilitating the worldwide shift toward clean energy. This study contributes to this effort by developing the first comprehensive global dataset of multispectral satellite imagery of solar panel farms. This dataset is intended to form the basis for training robust machine learning models, which can accurately map and analyze the expansion and distribution of solar panel farms globally. The insights gained from this endeavor will be instrumental in guiding informed decision-making for a sustainable energy future. https://github.com/yzyly1992/GloSoFarID
