Table of Contents
Fetching ...

fair_data.py: implementing FAIR data compliance in Tribchem

Lucrezia Berghenti, Elisa Damiani, Margherita Marsili, Maria Clelia Righi

TL;DR

The paper addresses the need to make high-throughput TribChem data FAIR by introducing dedicated tools. It presents fair_data.py and retrieve_data.py that extract TribChem results from MongoDB and produce FAIR-compliant JSON and TXT outputs, with a keyword-based retrieval option. By enabling deposition to Zenodo and other repositories, the approach supports reproducibility, data reuse, and data-driven materials discovery. The work provides a practical blueprint for integrating FAIR into automated DFT workflows, improving findability, accessibility, interoperability, and reusability of interfacial materials datasets.

Abstract

The increasing complexity and volume of data generated by high-throughput computational materials science require robust tools to ensure their accessibility, reproducibility, and reuse. In particular, integrating the FAIR Guiding Principles (Findable, Accessible, Interoperable, and Reusable) into computational workflows is essential to enable open science practices. TribChem is an open source Python software developed for the automated simulation of solid-solid interfaces using density functional theory (DFT). While TribChem already incorporates several FAIR-aligned features, we present here a dedicated FAIR utility designed to transform TribChem results into FAIR-compliant datasets. This utility comprises two tools: fair_data.py, which automatically generates standardized machine- and human-readable outputs from the TribChem database, and retrieve_data.py, which facilitates efficient data extraction through a keyword-based interface. In this paper we show the capabilities of the fair utility with examples for bulk, surface, and interface systems. The implementation allows seamless integration with public repositories such as Zenodo, paving the way for reproducible research and fostering data-driven materials discovery.

fair_data.py: implementing FAIR data compliance in Tribchem

TL;DR

The paper addresses the need to make high-throughput TribChem data FAIR by introducing dedicated tools. It presents fair_data.py and retrieve_data.py that extract TribChem results from MongoDB and produce FAIR-compliant JSON and TXT outputs, with a keyword-based retrieval option. By enabling deposition to Zenodo and other repositories, the approach supports reproducibility, data reuse, and data-driven materials discovery. The work provides a practical blueprint for integrating FAIR into automated DFT workflows, improving findability, accessibility, interoperability, and reusability of interfacial materials datasets.

Abstract

The increasing complexity and volume of data generated by high-throughput computational materials science require robust tools to ensure their accessibility, reproducibility, and reuse. In particular, integrating the FAIR Guiding Principles (Findable, Accessible, Interoperable, and Reusable) into computational workflows is essential to enable open science practices. TribChem is an open source Python software developed for the automated simulation of solid-solid interfaces using density functional theory (DFT). While TribChem already incorporates several FAIR-aligned features, we present here a dedicated FAIR utility designed to transform TribChem results into FAIR-compliant datasets. This utility comprises two tools: fair_data.py, which automatically generates standardized machine- and human-readable outputs from the TribChem database, and retrieve_data.py, which facilitates efficient data extraction through a keyword-based interface. In this paper we show the capabilities of the fair utility with examples for bulk, surface, and interface systems. The implementation allows seamless integration with public repositories such as Zenodo, paving the way for reproducible research and fostering data-driven materials discovery.

Paper Structure

This paper contains 7 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Schematic representation of a Tribchem workflow (e.g. surface energy convergence): the user provides input via CLI, Tribchem executes the calculations and outputs the results which are stored in a specific MongoDB collection (e.g. PBE.slab_elements).
  • Figure 2: The figure represents the structure of the command used to run the fair data utility. System, mp-code, formula and Miller indices are mandatory arguments; collection is an optional argument instead.
  • Figure 3: Schematic representation of the fair utility (for the Fe (111)-surface): the user provides input via CLI the information of the specific object, the script fair data.py connects to MongoDB database and outputs the json and txt files containing data and metadata relative to that system.
  • Figure 4: JSON and TXT files produced by fair data.py for the Fe (111)-surface in the mp-150 crystal structure.
  • Figure 5: Usage example of retrieve_data.py: the keywords are listed in a .txt file and passed through the terminal to retrieve the corresponding quantities.
  • ...and 1 more figures