Table of Contents
Fetching ...

How permanent are metadata for research data? Understanding changes in DataCite metadata

Dorothea Strecker

TL;DR

This study addresses how permanent DataCite metadata for research data truly is by leveraging PROV-based provenance data to quantify changes over a two-year window after initial DOI registration. It shows that 12.18% of records exhibit changes to top-level DataCite elements, with changes tending to be incremental and frequently administrative rather than wholesale revisions, indicating overall stability suitable for scientometric analyses. The work compares DataCite's behavior to other contexts, finding lower change rates than Crossref and traditional cataloging, and discusses implications for continuous metadata maintenance, completeness, and repository practices. The findings highlight both the relative reliability of DataCite metadata and the need for longer-term, finer-grained study to better understand repository-specific practices and their impact on metadata quality.

Abstract

With the move towards open research information, the DOI registration agency DataCite is increasingly used as a source for metadata describing research data, for example to perform scientometric analyses. However, there is a lack of research on how DataCite metadata describing research data are created and maintained. This paper adresses this gap by using DataCite metadata provenance information to analyze the overall prevalence and patterns of change to DataCite metadata records. Metadata change was observed for 12.18 % of metadata records in the sample, and change tends to be incremental and not extensive. DataCite metadata records offer reliable descriptions of datasets and are stable enough to be used in scientometric research. The rate of change differs from previous studies of metadata change in other contexts, suggesting that there are differences in metadata practices between research data repositories and more traditional cataloging environments. The observed changes do not seem to fully align with idealized conceptualizations of metadata creation and maintenance for research data. In particular, the data does not show that metadata records are maintained routinely and continuously. Metadata change also has a limited effect on metadata completeness.

How permanent are metadata for research data? Understanding changes in DataCite metadata

TL;DR

This study addresses how permanent DataCite metadata for research data truly is by leveraging PROV-based provenance data to quantify changes over a two-year window after initial DOI registration. It shows that 12.18% of records exhibit changes to top-level DataCite elements, with changes tending to be incremental and frequently administrative rather than wholesale revisions, indicating overall stability suitable for scientometric analyses. The work compares DataCite's behavior to other contexts, finding lower change rates than Crossref and traditional cataloging, and discusses implications for continuous metadata maintenance, completeness, and repository practices. The findings highlight both the relative reliability of DataCite metadata and the need for longer-term, finer-grained study to better understand repository-specific practices and their impact on metadata quality.

Abstract

With the move towards open research information, the DOI registration agency DataCite is increasingly used as a source for metadata describing research data, for example to perform scientometric analyses. However, there is a lack of research on how DataCite metadata describing research data are created and maintained. This paper adresses this gap by using DataCite metadata provenance information to analyze the overall prevalence and patterns of change to DataCite metadata records. Metadata change was observed for 12.18 % of metadata records in the sample, and change tends to be incremental and not extensive. DataCite metadata records offer reliable descriptions of datasets and are stable enough to be used in scientometric research. The rate of change differs from previous studies of metadata change in other contexts, suggesting that there are differences in metadata practices between research data repositories and more traditional cataloging environments. The observed changes do not seem to fully align with idealized conceptualizations of metadata creation and maintenance for research data. In particular, the data does not show that metadata records are maintained routinely and continuously. Metadata change also has a limited effect on metadata completeness.

Paper Structure

This paper contains 24 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Schematic overview of an example metadata workflow.
  • Figure 2: Distribution of the number of changes to a metadata record
  • Figure 3: Rate of metadata records that (A) use a metadata element in the first version and (B) change it at least once in later versions
  • Figure 4: Types of changes by metadata element
  • Figure 5: Distribution of the time passed between versions of metadata records in days
  • ...and 2 more figures