Table of Contents
Fetching ...

The Interoperability Challenge in DFT Workflows Across Implementations

S. K. Steensen, T. S. Thakur, M. Dillenz, J. M. Carlsson, C. R. C. Rego, E. Flores, H. Hajiyani, F. Hanke, J. M. G. Lastra, W. Wenzel, N. Marzari, T. Vegge, G. Pizzi, I. E. Castelli

TL;DR

The paper presents a practical framework for interoperable DFT workflows by introducing a universal JSON/YAML input–output standard that can be translated by different workflow managers into engine-specific inputs. It demonstrates engine-agnostic execution of an open-circuit voltage workflow across CASTEP, GPAW, Quantum ESPRESSO, and VASP and analyzes how to reconcile energetics from different engines, particularly for non-pristine, vacancy-containing structures. Key findings show that pristine-cell OCVs align across codes (often within a few hundredths of a volt), but vacancy-related energetics are sensitive to smearing, relaxation procedures, and pseudopotential choices, requiring cross-code validation and robust workflow design. The work culminates in design principles for robust automated DFT workflows, the semantic JSON-LD description of workflows to improve FAIRness, and provides code/data availability to enable reproducible cross-code benchmarking and scalable MAP integration.

Abstract

Interoperability and cross-validation remains a significant challenge in the computational materials discovery community. In this context, we introduce a common input/output standard designed for internal translation by various workflow managers (AiiDA, PerQueue, Pipeline Pilot, and SimStack) to produce results in a unified schema. This standard aims to enable engine-agnostic workflow execution across multiple density functional theory (DFT) codes, including CASTEP, GPAW, Quantum ESPRESSO, and VASP. As a demonstration, we have implemented a workflow to calculate the open-circuit voltage across several battery cathode materials using the proposed universal input/output schema. We analyze and resolve the challenges of reconciling energetics computed by different DFT engines and document the code-specific idiosyncrasies that make straightforward comparisons difficult. Motivated by these challenges, we outline general design principles for robust automated DFT workflows. This work represents a practical step towards more reproducible and interoperable workflows for high-throughput materials screening, while highlighting challenges of aligning electronic properties, especially for non-pristine structures.

The Interoperability Challenge in DFT Workflows Across Implementations

TL;DR

The paper presents a practical framework for interoperable DFT workflows by introducing a universal JSON/YAML input–output standard that can be translated by different workflow managers into engine-specific inputs. It demonstrates engine-agnostic execution of an open-circuit voltage workflow across CASTEP, GPAW, Quantum ESPRESSO, and VASP and analyzes how to reconcile energetics from different engines, particularly for non-pristine, vacancy-containing structures. Key findings show that pristine-cell OCVs align across codes (often within a few hundredths of a volt), but vacancy-related energetics are sensitive to smearing, relaxation procedures, and pseudopotential choices, requiring cross-code validation and robust workflow design. The work culminates in design principles for robust automated DFT workflows, the semantic JSON-LD description of workflows to improve FAIRness, and provides code/data availability to enable reproducible cross-code benchmarking and scalable MAP integration.

Abstract

Interoperability and cross-validation remains a significant challenge in the computational materials discovery community. In this context, we introduce a common input/output standard designed for internal translation by various workflow managers (AiiDA, PerQueue, Pipeline Pilot, and SimStack) to produce results in a unified schema. This standard aims to enable engine-agnostic workflow execution across multiple density functional theory (DFT) codes, including CASTEP, GPAW, Quantum ESPRESSO, and VASP. As a demonstration, we have implemented a workflow to calculate the open-circuit voltage across several battery cathode materials using the proposed universal input/output schema. We analyze and resolve the challenges of reconciling energetics computed by different DFT engines and document the code-specific idiosyncrasies that make straightforward comparisons difficult. Motivated by these challenges, we outline general design principles for robust automated DFT workflows. This work represents a practical step towards more reproducible and interoperable workflows for high-throughput materials screening, while highlighting challenges of aligning electronic properties, especially for non-pristine structures.

Paper Structure

This paper contains 37 sections, 6 equations, 5 figures.

Figures (5)

  • Figure 1: The workflow begins by reading the input JSON and ends by constructing the output JSON. DFT stages comprise a full cell+position relaxation of the discharged unit cell followed by a relaxation of the charged cell. A single termination criterion bounds the relative volume expansion between charged and discharged states to a user-specified limit (default $5\%$); if the check fails, the charged-constrained cell is produced through scaling. Upon acceptance, the workflow generates and scales supercells, evaluates low- and high-vacancy configurations (single cc. vacancy/occupation), and outputs OCV values along with metadata. Here, cc. abbreviates charge carrier.
  • Figure 2: a) The plot illustrates the impact of smearing width (Gaussian and Fermi) on the OCV values for Li$_2$Mn$_3$NiO$_8$ using the PerQueue/VASP implementation. The average and high SOC are converged even at relatively high smearing. The low SOC exhibits a jump at low smearing. Note that two additional tests with Fermi–Dirac smearing widths of [0.005, 0.09] eV were conducted to ensure a convergence plateau is reached for the lattice parameters of the discharged unit cell. Plots b) and c) below show the smearing dependence of the a, b, and c lattice parameters for the simple cubic lattice. In the case of Li$_2$Mn$_3$NiO$_8$, no scaling is performed to construct the supercell, since it is deemed sufficiently large, meaning the discharged unit cell and low SOC structure will have identical lattice parameters, which is seen in the second plot. Furthermore, the cubic nature of both the discharged and charged unit cell yields identical cells for the charged and charged-constrained system, which is also identical to the high SOC structure. The lattice parameters of all three of these structures are therefore represented together in plots b) and c). d) show the bandgap extracted in the low SOC structure at the different smearing values and types. Here, an abrupt change in the opening of a bandgap is seen corresponding to the change in the low OCV value. Though with the band gap closing again at very low Fermi-Dirac smearing.
  • Figure 3: The OCV calculations for Li$_2$Mn$_3$NiO$_8$, LCoO$_2$, LiTiS$_2$, MgMo$_3$S$_4$, and LiFePO$_4$ across the DFT engines Quantum ESPRESSO, VASP, GPAW and CASTEP.
  • Figure S4: Hierarchical representation of the JSON-LD description of the workflow. Purple circles represent objects (nodes), green open circles are attributes (edge-node), while green filled circles nest more objects that are not shown for clarity. Attributes with the prefix schema: and osmo: are described in the Schema.org vocabulary and OSMO ontology, respectively. Attributes without prefix inherit from the EMMO ontology. The uncollapsed hasInput links to an object of type Array with a human-readable description schema:description and the quantities within the array PositionVector and AtomicNumber.
  • Figure S5: The two PDOS for the low SOC structure in the workflow run with Li$_2$Mn$_3$NiO$_8$. The PDOS reveal very different electronic structures for the two cases