Automatic generation of input files with optimised k-point meshes for Quantum Espresso self-consistent field single point total energy calculations
Elena Patyukova, Junwen Yin, Susmita Basak, Samuel Pinilla Sanchez, Alin Elena, Gilberto Teobaldi
TL;DR
This work tackles the pervasive problem of selecting converged DFT parameters in high-throughput QE calculations by building a large convergence dataset and training models to predict the appropriate $k$-point density. It combines ensemble and graph-based models with a robust uncertainty framework based on conformalised quantile regression to provide lower-bound predictions that avoid under-sampling, and it validates these predictions against a baseline fixed-density approach. The best-performing model, RF with CSLM features, achieves $R^2$ around $0.70$ with low absolute errors, and uncertainty pricing ensures probabilistic guarantees of convergence. A publicly accessible web application is provided to automatically generate QE input files, enabling greener, faster, and more reproducible high-throughput DFT studies, though limitations remain for magnetic systems and other pseudopotential families.
Abstract
Performing density functional theory (DFT) calculations requires a careful choice of computational parameters to ensure convergence and obtain meaningful results. This represents a particularly important problem for high-throughput and agentic workflows, where due to computational cost, any additional convergence studies are preferably to be avoided. So, there is a need for tools and models which are able to predict DFT parameters from basic input information, such as a structure. In this work, we develop a machine learning approach to predict the appropriate k-point sampling in DFT calculations and generate the input files for Quantum Espresso self-consistent field calculations. To achieve this, we first generated a training dataset comprising over 20,000 materials, each with an energy convergence threshold of 1 meV/atom. Several ML models were evaluated for their ability to predict k-points distance, and uncertainty estimation was incorporated to guarantee that, for at least 85-95% of compounds, the predicted k-distance lies within the convergence region. The best-performing models are made publicly available through an open-access web application.
