Table of Contents
Fetching ...

Machine learning frontier orbital energies of nanodiamonds

Thorren Kirschbaum, Börries von Seggern, Joachim Dzubiella, Annika Bande, Frank Noé

TL;DR

This work tackles rapid design of nanodiamond materials by predicting frontier orbital energies with machine learning. It introduces ND5k, a dataset of 5,089 diamondoid and nanodiamond structures with DFTB-optimized geometries and DFT/PBE0 frontier energies, and benchmarks six ML models for interpolation and extrapolation to larger structures. PaiNN with average pooling delivers the best accuracy on ND5k, achieving MAEs of $0.16$ eV for $E_{ ext{HOMO}}$ and $0.19$ eV for $E_{ ext{LUMO}}$, while a PCA-reduced SOAP-ENN-S2S variant provides competitive performance. The results illustrate the benefit of integrating descriptor-informed node initialization with equivariant GNNs and establish ND5k as a useful resource for ML-guided nanodiamond photocatalyst design and beyond.

Abstract

Nanodiamonds have a wide range of applications including catalysis, sensing, tribology and biomedicine. To leverage nanodiamond design via machine learning, we introduce the new dataset ND5k, consisting of 5,089 diamondoid and nanodiamond structures and their frontier orbital energies. ND5k structures are optimized via tight-binding density functional theory (DFTB) and their frontier orbital energies are computed using density functional theory (DFT) with the PBE0 hybrid functional. We also compare recent machine learning models for predicting frontier orbital energies for similar structures as they have been trained on (interpolation on ND5k), and we test their abilities to extrapolate predictions to larger structures. For both the interpolation and extrapolation task, we find best performance using the equivariant graph neural network PaiNN. The second best results are achieved with a message passing neural network using a tailored set of atomic descriptors proposed here.

Machine learning frontier orbital energies of nanodiamonds

TL;DR

This work tackles rapid design of nanodiamond materials by predicting frontier orbital energies with machine learning. It introduces ND5k, a dataset of 5,089 diamondoid and nanodiamond structures with DFTB-optimized geometries and DFT/PBE0 frontier energies, and benchmarks six ML models for interpolation and extrapolation to larger structures. PaiNN with average pooling delivers the best accuracy on ND5k, achieving MAEs of eV for and eV for , while a PCA-reduced SOAP-ENN-S2S variant provides competitive performance. The results illustrate the benefit of integrating descriptor-informed node initialization with equivariant GNNs and establish ND5k as a useful resource for ML-guided nanodiamond photocatalyst design and beyond.

Abstract

Nanodiamonds have a wide range of applications including catalysis, sensing, tribology and biomedicine. To leverage nanodiamond design via machine learning, we introduce the new dataset ND5k, consisting of 5,089 diamondoid and nanodiamond structures and their frontier orbital energies. ND5k structures are optimized via tight-binding density functional theory (DFTB) and their frontier orbital energies are computed using density functional theory (DFT) with the PBE0 hybrid functional. We also compare recent machine learning models for predicting frontier orbital energies for similar structures as they have been trained on (interpolation on ND5k), and we test their abilities to extrapolate predictions to larger structures. For both the interpolation and extrapolation task, we find best performance using the equivariant graph neural network PaiNN. The second best results are achieved with a message passing neural network using a tailored set of atomic descriptors proposed here.
Paper Structure (11 sections, 4 equations, 4 figures, 2 tables)

This paper contains 11 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Flowchart illustrating the composition of the ND5k structures. The example shows the nanodiamonds with C$_{35}$H$_{36}$ base structure, F surface covering, two B dopants, and full or partial F-termination. Color: H (grey), B (rose), C (black), N (dark blue), O (red), F (light blue), Si (green,) P (orange).
  • Figure 2: HOMO (blue) and LUMO (orange) orbital energy distributions of the ND5k dataset.
  • Figure 3: a) Structure, b) HOMO contour plot, c) LUMO contour plot of the nanodiamond with ND5k index 3001 (P-doped, H-terminated, C$_{48}$H$_{48}$ base structure). Color: H (grey), C (yellow), P (orange). The well localized HOMO is plotted with isovalue $\pm$ 0.05, the diffuse LUMO with isovalue $\pm$ 0.01.
  • Figure 4: Top: Validation learning curves of the graph neural networks on ND5k (a) HOMO and (b) LUMO energy training, averaged over all training runs (cf. table \ref{['tab:ML']}). Mean absolute validation error (MAE) in eV is plotted against the number of epochs (log scale). Bottom: Learning curves with respect to the training set size on ND5k (c) HOMO and (d) LUMO energy training. Mean absolute testing error (MAE) in eV is plotted against the number of training examples (log scale).