Table of Contents
Fetching ...

Monitoring Germany's Core Energy System Dataset: A Data Quality Analysis of the Marktstammdatenregister

Florian Kotthoff, Christoph Muschner, Deniz Tepe, Esther Vogt, Ludwig Hülk

TL;DR

This work tackles the reliability of Germany's Marktstammdatenregister (MaStR), a central registry for energy units, by constructing a reproducible data-validation pipeline and implementing 90 SQL-based data tests across Basic, System size, Location, and Technology-specific categories. It couples a PostGIS-based data backend with dbt transformations and a SQLite-backed test store, publishing online dashboards for transparent monitoring via Datasette. A literature review establishes MaStR's growing research impact and informs the relevance of location-data accuracy for regional analyses. The results show that while basic data integrity is generally solid, substantial issues persist in location information and system-size consistency, particularly for wind and PV installations, underscoring the need for ongoing, collaborative validation and open sharing of validation tools. Overall, the paper delivers a first open, end-to-end validation workflow for MaStR that supports researchers, DSOs, and policymakers in improving data quality and trust in energy-system analyses.

Abstract

The energy system in Germany consists of a large number of distributed facilities, including millions of PV plants, wind turbines, and biomass plants. To understand and manage this system efficiently, accurate and reliable information about all facilities is essential. In Germany, the Marktstammdatenregister (MaStR) serves as a central registry for units of the energy system. The reliability of this data is critical for the registry's usefulness, but few validation studies have been published. In this work we provide a review of existing literature that relies on data from the MaStR and thereby show the registry's importance. We then build a data and testing pipeline for relevant data of the registry, with a focus on the two aspects of facility's location and size. All test results are published online in a reproducible workflow. Hence, this work contributes to a reliable data foundation for the German energy system and starts an open validation process of the Marktstammdatenregister from an academic perspective.

Monitoring Germany's Core Energy System Dataset: A Data Quality Analysis of the Marktstammdatenregister

TL;DR

This work tackles the reliability of Germany's Marktstammdatenregister (MaStR), a central registry for energy units, by constructing a reproducible data-validation pipeline and implementing 90 SQL-based data tests across Basic, System size, Location, and Technology-specific categories. It couples a PostGIS-based data backend with dbt transformations and a SQLite-backed test store, publishing online dashboards for transparent monitoring via Datasette. A literature review establishes MaStR's growing research impact and informs the relevance of location-data accuracy for regional analyses. The results show that while basic data integrity is generally solid, substantial issues persist in location information and system-size consistency, particularly for wind and PV installations, underscoring the need for ongoing, collaborative validation and open sharing of validation tools. Overall, the paper delivers a first open, end-to-end validation workflow for MaStR that supports researchers, DSOs, and policymakers in improving data quality and trust in energy-system analyses.

Abstract

The energy system in Germany consists of a large number of distributed facilities, including millions of PV plants, wind turbines, and biomass plants. To understand and manage this system efficiently, accurate and reliable information about all facilities is essential. In Germany, the Marktstammdatenregister (MaStR) serves as a central registry for units of the energy system. The reliability of this data is critical for the registry's usefulness, but few validation studies have been published. In this work we provide a review of existing literature that relies on data from the MaStR and thereby show the registry's importance. We then build a data and testing pipeline for relevant data of the registry, with a focus on the two aspects of facility's location and size. All test results are published online in a reproducible workflow. Hence, this work contributes to a reliable data foundation for the German energy system and starts an open validation process of the Marktstammdatenregister from an academic perspective.
Paper Structure (17 sections, 4 figures, 3 tables)

This paper contains 17 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Importance of MaStR in literature: In a), the number of published papers for the five identified research domains is plotted over the publication year (up to $1^{\text{st}}$ April 2024). In b), the number of papers that use different tables from the MaStR is plotted. The papers are further subdivided according to the used location information, where regional information represents the use of aggregated data on zip-code, district, or state level.
  • Figure 2: Automated pipeline for downloading, processing, testing, and vizualizing the MaStR dataset. In the first step (a), the required raw data is downloaded and written to a PostGIS database. Afterwards in (b), the data transformation and testing is performed using the framework dbt. All units that fail at least one test are written to an sqlite database. The sqlite database together with monitoring dashboards is then published using the framework datasette.
  • Figure 3: Results of the System size and Location Tests. The white bands in the three plots represent the ranges of allowed values. In the scatter plots a)-c) each dot represents one PV system. In a) the power of the PV modules is compared to the power of the inverter, in b) the power of the PV modules is comapared to the number of modules, and in c) the power of ground-mounted PV systems is compared to the area needed for the installation. Colors represent the log value of the fraction of y-axis and x-axis. In d) the share of units that have mismatching coordinates and districts is shown, where dark blue represents all units of the MaStR (including units that are not yet validated by DSOs).
  • Figure 4: Number of units where the coordinates do not lie within the district, plotted over the distance of the coordinates to the district boundaries. For wind turbines (left) we see that most turbines are close to their district with distance < 40km. For PV systems, the distance to the district can be larger. The last bar in both plots is relatively large, since we used the aggregated value of all wind turbines that have a distance larger than 60km (solar larger than 300km).