Table of Contents
Fetching ...

Automatic generation of input files with optimised k-point meshes for Quantum Espresso self-consistent field single point total energy calculations

Elena Patyukova, Junwen Yin, Susmita Basak, Samuel Pinilla Sanchez, Alin Elena, Gilberto Teobaldi

TL;DR

This work tackles the pervasive problem of selecting converged DFT parameters in high-throughput QE calculations by building a large convergence dataset and training models to predict the appropriate $k$-point density. It combines ensemble and graph-based models with a robust uncertainty framework based on conformalised quantile regression to provide lower-bound predictions that avoid under-sampling, and it validates these predictions against a baseline fixed-density approach. The best-performing model, RF with CSLM features, achieves $R^2$ around $0.70$ with low absolute errors, and uncertainty pricing ensures probabilistic guarantees of convergence. A publicly accessible web application is provided to automatically generate QE input files, enabling greener, faster, and more reproducible high-throughput DFT studies, though limitations remain for magnetic systems and other pseudopotential families.

Abstract

Performing density functional theory (DFT) calculations requires a careful choice of computational parameters to ensure convergence and obtain meaningful results. This represents a particularly important problem for high-throughput and agentic workflows, where due to computational cost, any additional convergence studies are preferably to be avoided. So, there is a need for tools and models which are able to predict DFT parameters from basic input information, such as a structure. In this work, we develop a machine learning approach to predict the appropriate k-point sampling in DFT calculations and generate the input files for Quantum Espresso self-consistent field calculations. To achieve this, we first generated a training dataset comprising over 20,000 materials, each with an energy convergence threshold of 1 meV/atom. Several ML models were evaluated for their ability to predict k-points distance, and uncertainty estimation was incorporated to guarantee that, for at least 85-95% of compounds, the predicted k-distance lies within the convergence region. The best-performing models are made publicly available through an open-access web application.

Automatic generation of input files with optimised k-point meshes for Quantum Espresso self-consistent field single point total energy calculations

TL;DR

This work tackles the pervasive problem of selecting converged DFT parameters in high-throughput QE calculations by building a large convergence dataset and training models to predict the appropriate -point density. It combines ensemble and graph-based models with a robust uncertainty framework based on conformalised quantile regression to provide lower-bound predictions that avoid under-sampling, and it validates these predictions against a baseline fixed-density approach. The best-performing model, RF with CSLM features, achieves around with low absolute errors, and uncertainty pricing ensures probabilistic guarantees of convergence. A publicly accessible web application is provided to automatically generate QE input files, enabling greener, faster, and more reproducible high-throughput DFT studies, though limitations remain for magnetic systems and other pseudopotential families.

Abstract

Performing density functional theory (DFT) calculations requires a careful choice of computational parameters to ensure convergence and obtain meaningful results. This represents a particularly important problem for high-throughput and agentic workflows, where due to computational cost, any additional convergence studies are preferably to be avoided. So, there is a need for tools and models which are able to predict DFT parameters from basic input information, such as a structure. In this work, we develop a machine learning approach to predict the appropriate k-point sampling in DFT calculations and generate the input files for Quantum Espresso self-consistent field calculations. To achieve this, we first generated a training dataset comprising over 20,000 materials, each with an energy convergence threshold of 1 meV/atom. Several ML models were evaluated for their ability to predict k-points distance, and uncertainty estimation was incorporated to guarantee that, for at least 85-95% of compounds, the predicted k-distance lies within the convergence region. The best-performing models are made publicly available through an open-access web application.

Paper Structure

This paper contains 15 sections, 6 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: The best model developed during this work is encapsulated in the web application for the generation of input files for QE pw.x calculations.
  • Figure 2: Distribution of compounds in the dataset over 14 Bravais lattices
  • Figure 3: Distribution of compounds with respect to the value of maximum distance between k-points. Vertical dashed line represents a max k-points distance, 0.06$^{-1}$), recommended for the reference calculations in [12].
  • Figure 4: Separate distributions with respect to the maximum distance between k-points for compounds with fixed Bravais lattice.
  • Figure 5: A comparison of the symmetrized K-points numbers used for the same material in MC3D-PBEsol-v1 and the optimized converged calculations in our database.
  • ...and 11 more figures