Estimation of Electronic Band Gap Energy From Material Properties Using Machine Learning

Sagar Prakash Barad; Sajag Kumar; Subhankar Mishra

Estimation of Electronic Band Gap Energy From Material Properties Using Machine Learning

Sagar Prakash Barad, Sajag Kumar, Subhankar Mishra

TL;DR

The paper tackles predicting electronic band gap energy and gap type from fundamental material properties without relying on preliminary DFT calculations or knowledge of the material structure. It introduces a clustered gap predictor (CGP) that partitions non-metals into five clusters and trains cluster-specific regression and gap-type classifiers, along with a shared metal–non-metal classifier. Using Benchmark AFLOW data with 55,298 samples and 9 features (including engineered electronegativity and group_numbers), it defines a joint evaluation score $\text{Score}$ to assess regression, classification, and metal/non-metal decisions. CGP achieves a high overall performance (e.g., AUC-ROC for metal/non-metal = 0.99 and average cluster MAE = $0.2321$ eV) and a final score of $0.9336$, indicating improved predictive capability over a single-model approach; the study suggests future work on more advanced clustering, larger datasets, and extensions to predict other material properties.

Abstract

Machine learning techniques are utilized to estimate the electronic band gap energy and forecast the band gap category of materials based on experimentally quantifiable properties. The determination of band gap energy is critical for discerning various material properties, such as its metallic nature, and potential applications in electronic and optoelectronic devices. While numerical methods exist for computing band gap energy, they often entail high computational costs and have limitations in accuracy and scalability. A machine learning-driven model capable of swiftly predicting material band gap energy using easily obtainable experimental properties would offer a superior alternative to conventional density functional theory (DFT) methods. Our model does not require any preliminary DFT-based calculation or knowledge of the structure of the material. We present a scheme for improving the performance of simple regression and classification models by partitioning the dataset into multiple clusters. A new evaluation scheme for comparing the performance of ML-based models in material sciences involving both regression and classification tasks is introduced based on traditional evaluation metrics. It is shown that on this new evaluation metric, our method of clustering the dataset results in better performance.

Estimation of Electronic Band Gap Energy From Material Properties Using Machine Learning

TL;DR

to assess regression, classification, and metal/non-metal decisions. CGP achieves a high overall performance (e.g., AUC-ROC for metal/non-metal = 0.99 and average cluster MAE =

eV) and a final score of

, indicating improved predictive capability over a single-model approach; the study suggests future work on more advanced clustering, larger datasets, and extensions to predict other material properties.

Abstract

Paper Structure (7 sections, 1 equation, 11 figures, 5 tables)

This paper contains 7 sections, 1 equation, 11 figures, 5 tables.

Introduction
Materials and Methodology
Machine Learning Algorithms
Architectures
Evaluation Metrics
Results
Conclusion

Figures (11)

Figure 1: Feature importance plot for band gap regression
Figure 2: Feature importance plot for gap type classification
Figure 3: Correlation matrix for the dataset.
Figure 4: The first architecture without the utilisation of clustering on non-metals.
Figure 5: The second architecture with clustering on non-metals. We call this the clustered gap predictor (CGP).
...and 6 more figures

Estimation of Electronic Band Gap Energy From Material Properties Using Machine Learning

TL;DR

Abstract

Estimation of Electronic Band Gap Energy From Material Properties Using Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)