Monitoring Germany's Core Energy System Dataset: A Data Quality Analysis of the Marktstammdatenregister
Florian Kotthoff, Christoph Muschner, Deniz Tepe, Esther Vogt, Ludwig Hülk
TL;DR
This work tackles the reliability of Germany's Marktstammdatenregister (MaStR), a central registry for energy units, by constructing a reproducible data-validation pipeline and implementing 90 SQL-based data tests across Basic, System size, Location, and Technology-specific categories. It couples a PostGIS-based data backend with dbt transformations and a SQLite-backed test store, publishing online dashboards for transparent monitoring via Datasette. A literature review establishes MaStR's growing research impact and informs the relevance of location-data accuracy for regional analyses. The results show that while basic data integrity is generally solid, substantial issues persist in location information and system-size consistency, particularly for wind and PV installations, underscoring the need for ongoing, collaborative validation and open sharing of validation tools. Overall, the paper delivers a first open, end-to-end validation workflow for MaStR that supports researchers, DSOs, and policymakers in improving data quality and trust in energy-system analyses.
Abstract
The energy system in Germany consists of a large number of distributed facilities, including millions of PV plants, wind turbines, and biomass plants. To understand and manage this system efficiently, accurate and reliable information about all facilities is essential. In Germany, the Marktstammdatenregister (MaStR) serves as a central registry for units of the energy system. The reliability of this data is critical for the registry's usefulness, but few validation studies have been published. In this work we provide a review of existing literature that relies on data from the MaStR and thereby show the registry's importance. We then build a data and testing pipeline for relevant data of the registry, with a focus on the two aspects of facility's location and size. All test results are published online in a reproducible workflow. Hence, this work contributes to a reliable data foundation for the German energy system and starts an open validation process of the Marktstammdatenregister from an academic perspective.
