fair_data.py: implementing FAIR data compliance in Tribchem
Lucrezia Berghenti, Elisa Damiani, Margherita Marsili, Maria Clelia Righi
TL;DR
The paper addresses the need to make high-throughput TribChem data FAIR by introducing dedicated tools. It presents fair_data.py and retrieve_data.py that extract TribChem results from MongoDB and produce FAIR-compliant JSON and TXT outputs, with a keyword-based retrieval option. By enabling deposition to Zenodo and other repositories, the approach supports reproducibility, data reuse, and data-driven materials discovery. The work provides a practical blueprint for integrating FAIR into automated DFT workflows, improving findability, accessibility, interoperability, and reusability of interfacial materials datasets.
Abstract
The increasing complexity and volume of data generated by high-throughput computational materials science require robust tools to ensure their accessibility, reproducibility, and reuse. In particular, integrating the FAIR Guiding Principles (Findable, Accessible, Interoperable, and Reusable) into computational workflows is essential to enable open science practices. TribChem is an open source Python software developed for the automated simulation of solid-solid interfaces using density functional theory (DFT). While TribChem already incorporates several FAIR-aligned features, we present here a dedicated FAIR utility designed to transform TribChem results into FAIR-compliant datasets. This utility comprises two tools: fair_data.py, which automatically generates standardized machine- and human-readable outputs from the TribChem database, and retrieve_data.py, which facilitates efficient data extraction through a keyword-based interface. In this paper we show the capabilities of the fair utility with examples for bulk, surface, and interface systems. The implementation allows seamless integration with public repositories such as Zenodo, paving the way for reproducible research and fostering data-driven materials discovery.
