Table of Contents
Fetching ...

Electronic structure prediction of medium and high entropy alloys across composition space

Shashank Pathrudkar, Stephanie Taylor, Abhishek Keripale, Abhijeet Sadashiv Gangan, Ponkrshnan Thiagarajan, Shivang Agarwal, Jaime Marian, Susanta Ghosh, Amartya S. Banerjee

TL;DR

This work tackles the expensive evaluation of electronic structure across composition space in medium and high entropy alloys by proposing a data-efficient ML workflow that predicts the ground-state electron density $\rho$ and derived energies. It combines Bayesian Active Learning with novel body-attached-frame descriptors and a separate $\delta ho$ model to improve accuracy, enabling reliable predictions across binary, ternary, and quaternary alloys (and even quinary extensions). The approach demonstrates strong generalization to unseen compositions, defects, and segregation patterns, while reducing training data needs by factors up to 2.5 (ternary) and 1.7 (quaternary) compared to tessellation-based sampling, achieving chemical-accuracy-level energies in many cases. This framework substantially accelerates exploration of composition space for HEAs/MEAs and can be extended to other bulk materials and low-dimensional systems, providing a practical route to rapid materials discovery from first principles.

Abstract

We propose machine learning (ML) models to predict the electron density -- the fundamental unknown of a material's ground state -- across the composition space of concentrated alloys. From this, other physical properties can be inferred, enabling accelerated exploration. A significant challenge is that the number of sampled compositions and descriptors required to accurately predict fields like the electron density increases rapidly with species. To address this, we employ Bayesian Active Learning (AL), which minimizes training data requirements by leveraging uncertainty quantification capabilities of Bayesian Neural Networks. Compared to strategic tessellation of the composition space, Bayesian-AL reduces the number of training data points by a factor of 2.5 for ternary (SiGeSn) and 1.7 for quaternary (CrFeCoNi) systems. We also introduce easy-to-optimize, body-attached-frame descriptors, which respect physical symmetries and maintain approximately the same descriptor-vector size as alloy elements increase. Our ML models demonstrate high accuracy and generalizability in predicting both electron density and energy across composition space.

Electronic structure prediction of medium and high entropy alloys across composition space

TL;DR

This work tackles the expensive evaluation of electronic structure across composition space in medium and high entropy alloys by proposing a data-efficient ML workflow that predicts the ground-state electron density and derived energies. It combines Bayesian Active Learning with novel body-attached-frame descriptors and a separate model to improve accuracy, enabling reliable predictions across binary, ternary, and quaternary alloys (and even quinary extensions). The approach demonstrates strong generalization to unseen compositions, defects, and segregation patterns, while reducing training data needs by factors up to 2.5 (ternary) and 1.7 (quaternary) compared to tessellation-based sampling, achieving chemical-accuracy-level energies in many cases. This framework substantially accelerates exploration of composition space for HEAs/MEAs and can be extended to other bulk materials and low-dimensional systems, providing a practical route to rapid materials discovery from first principles.

Abstract

We propose machine learning (ML) models to predict the electron density -- the fundamental unknown of a material's ground state -- across the composition space of concentrated alloys. From this, other physical properties can be inferred, enabling accelerated exploration. A significant challenge is that the number of sampled compositions and descriptors required to accurately predict fields like the electron density increases rapidly with species. To address this, we employ Bayesian Active Learning (AL), which minimizes training data requirements by leveraging uncertainty quantification capabilities of Bayesian Neural Networks. Compared to strategic tessellation of the composition space, Bayesian-AL reduces the number of training data points by a factor of 2.5 for ternary (SiGeSn) and 1.7 for quaternary (CrFeCoNi) systems. We also introduce easy-to-optimize, body-attached-frame descriptors, which respect physical symmetries and maintain approximately the same descriptor-vector size as alloy elements increase. Our ML models demonstrate high accuracy and generalizability in predicting both electron density and energy across composition space.

Paper Structure

This paper contains 24 sections, 13 equations, 27 figures, 8 tables.

Figures (27)

  • Figure 1: Schematic representation of our Machine Learning model showing descriptor generation and mapping to electron density using Bayesian Neural Network. The process begins with calculating atomic neighborhood descriptors $D(i)$ at each grid point, $i$, for the provided atomic configuration snapshot in the training data. A Bayesian Neural Network is trained to provide a probabilistic map from the atomic neighborhood descriptors $D(i)$ to the electronic charge density and corresponding uncertainty measure at grid point, $i$. Application of the trained model to generate charge density predictions for a given new query configuration requires: descriptor generation for the query configuration, forward propagation through the Bayesian Neural Network, and aggregation of the point-wise charge density predictions $\rho(i)$ and uncertainty values to obtain the charge density field $\rho$ and uncertainty field, respectively.
  • Figure 2: Iterative training for accurate prediction across composition space of binary alloy.(a) Error in $\rho$ prediction for Si$_x$Ge$_{1-x}$, where the model was trained using only $x = 0.50$ and tested on all $x \neq 0.50$. (b) Error in $\rho$ prediction for Si$_x$Ge$_{1-x}$, where the model was trained using $x = 0,\, 0.50,\, 1.00$ and tested at other compositions. The error across entire composition space reduces significantly with the addition of only two extra training compositions. m0: Training, m1: Testing
  • Figure 3: Training compositions for three levels of tessellation (T1, T2 and T4). The red dots show training compositions. The top row shows compositions for the ternary (SiGeSn) system and the bottom row shows compositions for the quaternary (CrFeCoNi) system. Note that we train the model T4 with the 4th iteration of tessellation, because the training compositions in the third iteration exclude available training compositions from the second iteration. The star depicts an additional point considered in the quaternary T2 model to capture information in the center, approximating the octahedron in the second tessellation of the tetrahedron.
  • Figure 4: Bayesian Active Learning to iteratively select training compositions to accurately predict across composition space of Ternary alloy. (a) NRMSE across the composition space after 1st iteration of Active Learning, termed as AL1, trained using only 3 pure compositions shown using white circles. (b) Energy prediction error for model AL1 with 3 pure composition. (c) Epistemic Uncertainty in $\rho$ prediction across composition space after prediction with model AL1. Query points (additional training points) for the next iteration of Bayesian Active Learning are selected based on highest uncertainty regions shown in 'f'. (d) NRMSE across the composition space after 2nd iteration of Active Learning. 3 additional training points are added as per the uncertainty contour in subfigure, 'c'. This model is termed as AL2. We observe that the NRMSE is low and consistent across the composition space showing the effectiveness of query points selection through uncertainty. (e) Error in energy prediction across composition space. The unit of energy error is Ha/atom. The energy error is within chemical accuracy across the composition space. (f) Epistemic Uncertainty in $\rho$ prediction across composition space after prediction with model AL2. This figure uses same colorbars for AL1 and AL2 models. Refer to Figure S6 in the Supplemental Material for figure with distinct colorbars.
  • Figure 5: Training compositions for Quaternary system for Active Learning models.Left: 4 training compositions used for model AL1, Middle: 10 training compositions used for model AL2, Right: 20 training compositions used for model AL3. Black spheres indicate compositions on vertex, blue spheres indicate compositions on edges, green spheres indicate compositions on faces and red spheres indicate compositions inside the tetrahedron.
  • ...and 22 more figures