echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions
Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth
TL;DR
The paper addresses the challenge of implementing FAIR data practices without heavyweight infrastructure by introducing the echemdb toolkit, a lightweight, file-system-based workflow that uses YAML metadata and frictionless Data Packages, along with a unitpackage API for in-file data exploration. It presents end-to-end processes for data preparation, automatic metadata capture, standardization, and packaging, plus demonstrators for electrochemistry and literature data to showcase interoperability and visualization in browsers and notebooks. The approach lowers adoption barriers for institutions with limited RDM expertise and enables robust data sharing and reuse through a simple, extensible framework. Overall, the work provides a practical pathway to embed machine-readable metadata within routine research data workflows, enhancing findability, accessibility, interoperability, and reusability in a low-friction manner.
Abstract
According to the FAIR (findability, accessibility, interoperability, and reusability) principles, scientific data should always be stored with machine-readable descriptive metadata. Existing solutions to store data with metadata, such as electronic lab notebooks (ELN), are often very domain-specific and not sufficiently generic for arbitrary experimental or computational results. In this work, we present open-source echemdb toolkit for creating and handling data and metadata. The toolkit is running entirely on the file system level using a file-based approach, which facilitates integration with other tools in a FAIR data life cycle and means that no complicated server setup is required. This also makes the toolkit more accessible to the average researcher since no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic annotation of raw research data with human- and machine-readable metadata, data conversion into standardised frictionless Data Packages, and an API for exploring the data. We also illustrate the web frameworks to illustrate the data using example data from research into energy conversion and storage.
