Table of Contents
Fetching ...

echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions

Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth

TL;DR

The paper addresses the challenge of implementing FAIR data practices without heavyweight infrastructure by introducing the echemdb toolkit, a lightweight, file-system-based workflow that uses YAML metadata and frictionless Data Packages, along with a unitpackage API for in-file data exploration. It presents end-to-end processes for data preparation, automatic metadata capture, standardization, and packaging, plus demonstrators for electrochemistry and literature data to showcase interoperability and visualization in browsers and notebooks. The approach lowers adoption barriers for institutions with limited RDM expertise and enables robust data sharing and reuse through a simple, extensible framework. Overall, the work provides a practical pathway to embed machine-readable metadata within routine research data workflows, enhancing findability, accessibility, interoperability, and reusability in a low-friction manner.

Abstract

According to the FAIR (findability, accessibility, interoperability, and reusability) principles, scientific data should always be stored with machine-readable descriptive metadata. Existing solutions to store data with metadata, such as electronic lab notebooks (ELN), are often very domain-specific and not sufficiently generic for arbitrary experimental or computational results. In this work, we present open-source echemdb toolkit for creating and handling data and metadata. The toolkit is running entirely on the file system level using a file-based approach, which facilitates integration with other tools in a FAIR data life cycle and means that no complicated server setup is required. This also makes the toolkit more accessible to the average researcher since no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic annotation of raw research data with human- and machine-readable metadata, data conversion into standardised frictionless Data Packages, and an API for exploring the data. We also illustrate the web frameworks to illustrate the data using example data from research into energy conversion and storage.

echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions

TL;DR

The paper addresses the challenge of implementing FAIR data practices without heavyweight infrastructure by introducing the echemdb toolkit, a lightweight, file-system-based workflow that uses YAML metadata and frictionless Data Packages, along with a unitpackage API for in-file data exploration. It presents end-to-end processes for data preparation, automatic metadata capture, standardization, and packaging, plus demonstrators for electrochemistry and literature data to showcase interoperability and visualization in browsers and notebooks. The approach lowers adoption barriers for institutions with limited RDM expertise and enables robust data sharing and reuse through a simple, extensible framework. Overall, the work provides a practical pathway to embed machine-readable metadata within routine research data workflows, enhancing findability, accessibility, interoperability, and reusability in a low-friction manner.

Abstract

According to the FAIR (findability, accessibility, interoperability, and reusability) principles, scientific data should always be stored with machine-readable descriptive metadata. Existing solutions to store data with metadata, such as electronic lab notebooks (ELN), are often very domain-specific and not sufficiently generic for arbitrary experimental or computational results. In this work, we present open-source echemdb toolkit for creating and handling data and metadata. The toolkit is running entirely on the file system level using a file-based approach, which facilitates integration with other tools in a FAIR data life cycle and means that no complicated server setup is required. This also makes the toolkit more accessible to the average researcher since no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic annotation of raw research data with human- and machine-readable metadata, data conversion into standardised frictionless Data Packages, and an API for exploring the data. We also illustrate the web frameworks to illustrate the data using example data from research into energy conversion and storage.
Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Figures (4)

  • Figure 1: An example set of metadata for data exchange formats, i.e., a) YAML, b) JSON, and c) XML, storing metadata in key-value pairs.
  • Figure 2: Snapshot of the graphical user interface of the pythonautotag-metadata program (top),AUTOTAG2024 which monitors the file system (bottom left) for file creation events and tags files with metadata. The program can be coupled with external editors, such as VSCodium (bottom right), which provides syntax highlighting and can be used for validating the metadata against a schema.
  • Figure 3: Example content of files for a) time series data stored as a CSV and b) metadata stored as YAML. The latter describes the structure of the CSV.
  • Figure 4: Snapshot of the CV database displaced on the echemdb websiteECHEMDBwebsite generated from frictionless Data Packages. Page a) shows a list of entries with the most relevant descriptor and b) provides detailed information on the respective entry.