Table of Contents
Fetching ...

TBHubbard: tight-binding and extended Hubbard model database for metal-organic frameworks

Pamela C. Carvalho, Federico Zipoli, Alan C. Duriez, Marco Antonio Barroca, Rodrigo Neumann Barros Ferreira, Barbara Jones, Benjamin Wunsch, Mathias Steiner

TL;DR

This work has applied a tight-binding, lattice Hamiltonian and density functional theory to MOFs for performing electronic structure calculations and provides a tight-binding representation of 10,000 MOFs, and an Extended Hubbard model representation for a sub-set of 240 MOFs containing transition metals.

Abstract

Metal-organic frameworks (MOFs) are porous materials composed of metal ions and organic linkers. Due to their chemical diversity, MOFs can support a broad range of applications in chemical separations. However, the vast amount of structural compositions encoded in crystallographic information files complicates application-oriented, computational screening and design. The existing crystallographic data, therefore, requires augmentation by simulated data so that suitable descriptors for machine-learning and quantum computing tasks become available. Here, we provide extensive simulation data augmentation for MOFs within the QMOF database. We have applied a tight-binding, lattice Hamiltonian and density functional theory to MOFs for performing electronic structure calculations. Specifically, we provide a tight-binding representation of 10,000 MOFs, and an Extended Hubbard model representation for a sub-set of 240 MOFs containing transition metals, where intra-site U and inter-site V parameters are computed self-consistently. The data supports computational workflows for identifying structure-property correlations that are needed for inverse material design. For validation and reuse, we have made the data available at https://dataverse.harvard.edu/dataverse/tbhubbard/.

TBHubbard: tight-binding and extended Hubbard model database for metal-organic frameworks

TL;DR

This work has applied a tight-binding, lattice Hamiltonian and density functional theory to MOFs for performing electronic structure calculations and provides a tight-binding representation of 10,000 MOFs, and an Extended Hubbard model representation for a sub-set of 240 MOFs containing transition metals.

Abstract

Metal-organic frameworks (MOFs) are porous materials composed of metal ions and organic linkers. Due to their chemical diversity, MOFs can support a broad range of applications in chemical separations. However, the vast amount of structural compositions encoded in crystallographic information files complicates application-oriented, computational screening and design. The existing crystallographic data, therefore, requires augmentation by simulated data so that suitable descriptors for machine-learning and quantum computing tasks become available. Here, we provide extensive simulation data augmentation for MOFs within the QMOF database. We have applied a tight-binding, lattice Hamiltonian and density functional theory to MOFs for performing electronic structure calculations. Specifically, we provide a tight-binding representation of 10,000 MOFs, and an Extended Hubbard model representation for a sub-set of 240 MOFs containing transition metals, where intra-site U and inter-site V parameters are computed self-consistently. The data supports computational workflows for identifying structure-property correlations that are needed for inverse material design. For validation and reuse, we have made the data available at https://dataverse.harvard.edu/dataverse/tbhubbard/.

Paper Structure

This paper contains 10 sections, 10 figures.

Figures (10)

  • Figure 1: (a) Illustration of the TBHubbard dataset. The QMOF Rosen2021 database is indicated in pink, providing over 20,000 MOF structures. From this data collection, the TBHubbard database comprises two subsets of materials: the Tight-binding (in green) and Extended Hubbard (in blue) subsets with $\approx$ 10,000 and $\approx$ 200 materials, respectively; (b) t-SNE projection of tight-binding matrices, where points are colored according to the different databases analyzed in this study; (c) t-SNE projection of SOAP-3 Å descriptors for metal atoms across the dataset. A preliminary PCA step reduced the descriptor dimensionality to 8 components, retaining 97 % of the total variance. The color scheme for the t-SNE plots is as follows: pink for the QMOF database, blue for the EH subset, and green for the TB subset.
  • Figure 2: False-color image representing the normalized $|t_{ij}|^2$ tight-binding matrix coefficients of the localized orbital basis set for MOFs (a) FePtC8H4N6 (or qmof-3dfbcbd) and (b) CdNiC8H12N6 (or qmof-4d9a98c), with their respective structural representations shown above. For visualizing the matrix, the maximum intensity is set to 0.5 and the matrix diagonal to 0. The MOF structure images were created using VESTA Momma2008.
  • Figure 3: (a) Density histogram comparing the transition metal distribution in the QMOF database with the Tight-binding (TB) and Extended Hubbard (EH) data sets. The inset shows the PCA projection of TB embeddings for symmetry-independent metal atoms. In (b)-(e), probability density functions, comparing the QMOF database with the TB and EH data sets, are shown with regards to number of atoms in the unit cell; pore-limiting diameter, PLD, (in Å); mass density (in g/cm$^3$) and standard DFT band gap (in eV), respectively.
  • Figure 4: Scatter plot of the intra-site $U$ and inter-site $V$ Hubbard parameters for the Extended Hubbard (EH) data set, ordered by the occurrence of transition metals in each material, considering the calculations with different type of manifolds, i.e., performing perturbations on (a) d-p or (b) d-s orbitals, see Methods section for details. Scatter plot of band gap energies computed using standard DFT (E$_{\text{g}}^{\text{DFT}}$) and the DFT+U+V (E$_{\text{g}}^{\text{DFT+U+V}}$) framework for the EH subset considering (c) d-p and (d) d-s perturbations. The color map represents the atomic number of the transition metal associated with each material.
  • Figure 5: (a) TB embeddings for a Zn atom in qmof-fffeb7b, showing the top-7 strongest interaction blocks extracted from the Hamiltonian parameters. Each block is represented by a $13 \times 13$ matrix and ranked by the absolute maximum value within the block, capturing the most significant electronic interactions regardless of sign. The $y$-label indicates the atom for which the embedding is computed, while the $x$-label denotes the rank in the $i\text{–}j$ block. To aid visualization, the values are plotted within the range $[-2.5, 2.5]$, with any values outside this range clipped by the color palette. (b) Distribution of pairwise Euclidean distances between SOAP feature vectors computed with SOAP-3Å (red) and SOAP-5Å (blue). Vertical dotted lines indicate the mean averaged error of the Euclidean distance in test set predictions, located at 33.346 for SOAP-3Å (red) and 246.464 for SOAP-5Å (blue). (c) Mean Euclidean Distance Error between true and predicted values (computed over all test samples) as a function of the number of Tight-binding embedding blocks used. The error is reported for both short-range (SOAP-3Å ) and long-range (SOAP-5Å ) descriptors.
  • ...and 5 more figures