Table of Contents
Fetching ...

Teaching a Transformer to Think Like a Chemist: Predicting Nanocluster Stability

João Marcos T. Palheta, Octavio Rodrigues Filho, Mohammad Soleymanibrojeni, Alexandre Cavalheiro Dias, Diego Guedes-Sobrinho, Wolfgang Wenzel, Roland Aydin, Celso R. C. Rêgo, Maurício Jeomar Piotrowski

TL;DR

This work tackles the design of bimetallic nanoclusters by combining density functional theory with a physics-guided transformer to predict formation energies and in/out stability in 13-atom ICO X12TM clusters. A FTTransformer is pretrained on a large unary dataset from the Quantum Cluster Database and then fine-tuned on a small bimetallic set, achieving mean absolute errors around $0.67$ eV with calibrated uncertainty, and showing rapid transfer to unseen Fe-host domains. DFT reveals systematic trends in core-shell versus surface motifs linked to $d$-band center, ECN, bond lengths, and HOMO-LUMO gaps, while the transformer captures these physics-informed patterns via attention and SHAP explanations, yielding interpretable design rules. The approach is openly shared under FAIR/TRUE principles, enabling reproducible, interpretable screening of unexplored nanocluster chemistries for catalysis and energy conversion, with an emphasis on transferable knowledge across host–dopant combinations.

Abstract

Atomically precise metal nanoclusters bridge the molecular and bulk regimes, but designing bimetallic motifs with targeted stability and reactivity remains challenging. Here we combine density functional theory (DFT) and physics-grounded predictive artificial intelligence to map the configurational landscape of 13-atom icosahedral nanoclusters X$_{12}$TM, with hosts X = (Ti, Zr, Hf), and Fe and a single transition--metal dopant spanning the 3$d$-5$d$ series. Spin-polarized DFT calculations on 240 bimetallic clusters reveal systematic trends in binding and formation energies, distortion penalties, effective coordination number, d-band centre, and HOMO-LUMO gap that govern the competition between core-shell (in) and surface-segregated (out) arrangements. We then pretrain a transformer architecture on a curated set of 2968 unary clusters from the Quantum Cluster Database and fine-tune it on bimetallic data to predict formation energies and in/out preference, achieving mean absolute errors of about $0.6-0.7$eV and calibrated uncertainty intervals. The resulting model rapidly adapts to an unseen Fe-host domain with only a handful of labelled examples. At the same time, attention patterns and Shapley attributions highlight size mismatch, $d$-electron count, and coordination environment as key descriptors. All data, code, and workflows follow FAIR/TRUE principles, enabling reproducible, interpretable screening of unexplored nanocluster chemistries for catalysis and energy conversion.

Teaching a Transformer to Think Like a Chemist: Predicting Nanocluster Stability

TL;DR

This work tackles the design of bimetallic nanoclusters by combining density functional theory with a physics-guided transformer to predict formation energies and in/out stability in 13-atom ICO X12TM clusters. A FTTransformer is pretrained on a large unary dataset from the Quantum Cluster Database and then fine-tuned on a small bimetallic set, achieving mean absolute errors around eV with calibrated uncertainty, and showing rapid transfer to unseen Fe-host domains. DFT reveals systematic trends in core-shell versus surface motifs linked to -band center, ECN, bond lengths, and HOMO-LUMO gaps, while the transformer captures these physics-informed patterns via attention and SHAP explanations, yielding interpretable design rules. The approach is openly shared under FAIR/TRUE principles, enabling reproducible, interpretable screening of unexplored nanocluster chemistries for catalysis and energy conversion, with an emphasis on transferable knowledge across host–dopant combinations.

Abstract

Atomically precise metal nanoclusters bridge the molecular and bulk regimes, but designing bimetallic motifs with targeted stability and reactivity remains challenging. Here we combine density functional theory (DFT) and physics-grounded predictive artificial intelligence to map the configurational landscape of 13-atom icosahedral nanoclusters XTM, with hosts X = (Ti, Zr, Hf), and Fe and a single transition--metal dopant spanning the 3-5 series. Spin-polarized DFT calculations on 240 bimetallic clusters reveal systematic trends in binding and formation energies, distortion penalties, effective coordination number, d-band centre, and HOMO-LUMO gap that govern the competition between core-shell (in) and surface-segregated (out) arrangements. We then pretrain a transformer architecture on a curated set of 2968 unary clusters from the Quantum Cluster Database and fine-tune it on bimetallic data to predict formation energies and in/out preference, achieving mean absolute errors of about eV and calibrated uncertainty intervals. The resulting model rapidly adapts to an unseen Fe-host domain with only a handful of labelled examples. At the same time, attention patterns and Shapley attributions highlight size mismatch, -electron count, and coordination environment as key descriptors. All data, code, and workflows follow FAIR/TRUE principles, enabling reproducible, interpretable screening of unexplored nanocluster chemistries for catalysis and energy conversion.

Paper Structure

This paper contains 14 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: Schematic of the dataset and model pipeline. A 13.0-atom icosahedral cluster X13 is considered with two single-atom substitution topologies: ($i$) inner-site (core) replacement X$_{12}$TM$^{in}$ and ($ii$) outer/surface-site replacement X$_{12}$TM$^{out}$. Here X $\in$ {Ti, Zr, Hf} and TM spans the 3.0$d$--5.0$d$ series; $in$ and $out$ denote substitution at the central atom and at one of the 12.0 surface vertices, respectively. Feature engineering produces continuous and categorical descriptors that are projected/embedded and concatenated, then passed through $N$ Transformer blocks (frozen during head-only fine-tuning) to capture cross-feature interactions. A lightweight MLP head (trainable) flattens the representation and predicts the targets, e.g., $E_{form}$ and $\Delta E_{tot}$ for $in$ and $out$ cases.
  • Figure 2: The binding energy modulus is shown as function of the atomic number for X$_{12}$TM$^{in}$ (triangle) and X$_{12}$TM$^{out}$ (circle) (X = Ti, Zr, Hf) nanoclusters. The energy decomposition presents the binding energy of X$_{12}$ ICO-derived structure ($E_b^u$), the interaction energy between X$_{12}$ and TM in the equilibrium geometry of X$_{12}$TM$^{in}$ or X$_{12}$TM$^{out}$ ($E_{int}$), and the distortion energy occasioned by interaction between TM species and X$_{12}$ structure (${\Delta}E_{dist}$).
  • Figure 3: Relative total energy ($\Delta E_{tot}$) as a function of atomic number for Ti$_{12}$TM (blue circles), Zr$_{12}$TM (green triangles), and Hf$_{12}$TM (red squares) nanoclusters. Negative values indicate energetic stabilization of $in$ configurations, while positive values indicate stabilization of $out$ configurations.
  • Figure 4: Structural properties: the average bond lengths ($d_{\text{av}}$) and the effective coordination number (ECN) as function of the TM atomic number for X$_{12}$TM$^{in}$ (triangle) and X$_{12}$TM$^{out}$ (circle) (X = Ti, Zr, Hf) nanoclusters.
  • Figure 5: Fraction of the total Fe host atoms ($X_{12}$) in holdout set compared against the coverage and width of prediction intervals, as well as MAE and $R^2$ score. This fraction changes between 0.0 and 1.0. When 0.0 that means all Fe host atoms are in the training set and the holdout set does not include any entries with Fe as host atom instead with other elements, when 1.0 the opposite holds. The minimum size of the holdout set is 20 and the maximum is 30. At $x =0.95$ which is insertion of only 2.0 entries containing Fe host atoms in training set, all metrics improve, at $x=0.65$ and lower $x$, we can see a stable trend starts. (a) The coverage of prediction intervals, $\alpha=0.1$, (b) Mean prediction intervals for both in and out configurations, (c) MAE for both targets in the holdout set, (d) $R^2$ score is the regression score for both targets.
  • ...and 2 more figures