Table of Contents
Fetching ...

Interpretable Machine Learning of Nanoparticle Stability through Topological Layer Embeddings

Felipe Hawthorne, Leandro Seixas, James M. Almeida, Cristiano F. Woellner, Raphael M. Tromer

TL;DR

A data-efficient and physically interpretable machine-learning framework based on a fragmented, layer-resolved descriptor that explicitly decomposes nanoparticles into surface, intermediate, and core environments using a topology-driven definition is introduced.

Abstract

The stability of chemically complex nanoparticles is governed by an immense configurational space arising from heterogeneous local atomic environments across surface and interior regions. Efficiently identifying low-energy configurations within this space remains a central challenge for first-principles-based materials discovery, particularly when the available reference data are limited. Here, we introduce a data-efficient and physically interpretable machine-learning framework based on a fragmented, layer-resolved descriptor that explicitly decomposes nanoparticles into surface, intermediate, and core environments using a topology-driven definition. This representation preserves a compact and fixed feature dimensionality while retaining spatial resolution, enabling controlled emphasis on different regions of the nanoparticle through physically motivated weighting schemes. Coupled with gradient-boosted decision tree models and a ranking-based learning strategy, the proposed framework enables accurate identification of the most stable nanoparticle configurations using only a few hundred density functional theory reference calculations. Ranking performance metrics demonstrate near-saturation of correlation, high top-k recall, and rapidly vanishing regret at moderate training-set sizes, highlighting the strong data efficiency of the approach. Beyond predictive performance, layer-weighting and SHAP-based interpretability analyses reveal how surface segregation, coordination topology, and local chemical disorder contribute differently to stability across spatial regions of the nanoparticle. These insights provide a transparent physical interpretation of the learned models and establish a natural pathway toward active learning-driven exploration of complex nanoparticle configurational spaces.

Interpretable Machine Learning of Nanoparticle Stability through Topological Layer Embeddings

TL;DR

A data-efficient and physically interpretable machine-learning framework based on a fragmented, layer-resolved descriptor that explicitly decomposes nanoparticles into surface, intermediate, and core environments using a topology-driven definition is introduced.

Abstract

The stability of chemically complex nanoparticles is governed by an immense configurational space arising from heterogeneous local atomic environments across surface and interior regions. Efficiently identifying low-energy configurations within this space remains a central challenge for first-principles-based materials discovery, particularly when the available reference data are limited. Here, we introduce a data-efficient and physically interpretable machine-learning framework based on a fragmented, layer-resolved descriptor that explicitly decomposes nanoparticles into surface, intermediate, and core environments using a topology-driven definition. This representation preserves a compact and fixed feature dimensionality while retaining spatial resolution, enabling controlled emphasis on different regions of the nanoparticle through physically motivated weighting schemes. Coupled with gradient-boosted decision tree models and a ranking-based learning strategy, the proposed framework enables accurate identification of the most stable nanoparticle configurations using only a few hundred density functional theory reference calculations. Ranking performance metrics demonstrate near-saturation of correlation, high top-k recall, and rapidly vanishing regret at moderate training-set sizes, highlighting the strong data efficiency of the approach. Beyond predictive performance, layer-weighting and SHAP-based interpretability analyses reveal how surface segregation, coordination topology, and local chemical disorder contribute differently to stability across spatial regions of the nanoparticle. These insights provide a transparent physical interpretation of the learned models and establish a natural pathway toward active learning-driven exploration of complex nanoparticle configurational spaces.
Paper Structure (8 sections, 17 equations, 5 figures)

This paper contains 8 sections, 17 equations, 5 figures.

Figures (5)

  • Figure 1: Layer-resolved structural, chemical, and electronic descriptors for Al$_{70}$Co$_{10}$Fe$_5$Ni$_{10}$Cu$_5$ decagonal quasicrystalline alloy nanoparticles. Nanoparticles are partitioned into six topological layers defined by graph-based coordination analysis. Shown are (a) mean atomic fractions, (b) mean Pauling electronegativity, (c) mean valence electron concentration, and (d) mean coordination number as a function of topological layer index.
  • Figure 2: Principal component analysis and model comparison for the ranking of Al$_{70}$Co$_{10}$Fe$_5$Ni$_{10}$Cu$_5$ decagonal quasicrystalline alloy nanoparticles. (a) Projection of the descriptor space onto the first two principal components, colored by the total energy $E_{\mathrm{tot}}$. (b) Total energy as a function of the first principal component, illustrating the non-linear relationship between dominant descriptor variance and energetic stability. (c) Comparison of baseline regression models and Optuna-optimized XGBoost in terms of ranking performance, quantified by the Spearman rank correlation and Recall@5.
  • Figure 3: Training (left) and test (right) performance of the Optuna-optimized XGBoost model under different layer-weighting schemes. (a,b) Uniform weighting of all topological layers ($w_L = 1.0$). (c,d) Surface-emphasized embedding with $w_{L0}=w_{L1}=1.0$ and reduced weights for deeper layers. (e,f) Intermediate-layer–emphasized embedding with $w_{L2}=w_{L3}=1.0$. (g,h) Core-emphasized embedding with $w_{L4}=w_{L5}=1.0$. Each panel reports predicted versus DFT total energies, illustrating how the relative emphasis on surface, intermediate, or core environments affects model generalization.
  • Figure 4: SHAP (SHapley Additive exPlanations) analysis of the Optuna-optimized XGBoost model for different layer-weighting schemes, shown in the same top-to-bottom order as Fig. \ref{['fig:xg']}. (a) Uniform weighting of all topological layers. (b) Surface-emphasized embedding. (c) Intermediate-layer–emphasized embedding. (d) Core-emphasized embedding. Left panels show SHAP summary plots, reporting the distribution of feature contributions across the dataset, while right panels rank features by mean absolute SHAP value, highlighting the dominant structural and chemical descriptors controlling the predicted total energy.
  • Figure 5: Learning curves for the ranking of AlFeCoNiCu nanoparticle configurations as a function of the training-set size $N_{\mathrm{train}}$. Results are shown for three test-set sizes ($N_{\mathrm{test}} = 10, 20, 30$) and two screening budgets ($k = 5, 10$). (a) Spearman rank correlation $\rho$ between predicted and DFT reference energies, probing global rank consistency. (b) Top-$k$ screening recall, quantifying the fraction of true low-energy structures recovered within the model-selected subset. (c) Top-$k$ regret, defined as the energy difference between the lowest-energy structure identified within the selected top-$k$ and the true global minimum. Error bars indicate the standard deviation over multiple random train--test splits and model initializations.