Table of Contents
Fetching ...

Will it form a glass? Tackling glass formation using binary classification

Diogo P. L. Carvalho, Ana C. B. Loponi, Daniel R. Cassar

Abstract

Glass formation is one of the most important and fundamental open problems in glass science. Predicting whether a liquid can be easily frozen into a glass appears simple but is far from it. In this communication, we address glass formation in inorganic nonmetallic liquids using binary classification to predict the probability that a given liquid will form a glass under typical laboratory conditions. Using a dataset of more than 50,000 examples, we trained random forest classifiers that achieved ROC-AUC values around 0.89 and PR-AUC close to 0.95 on the holdout dataset (i.e., unseen data). A rigorous model selection routine was employed, including hyperparameter tuning with cross-validation, and four different data treatment routes were evaluated. Using SHAP values, we extracted valuable insights from the trained models that both agree with established knowledge and extend it. For example, we identified that the bandgap energy of the constituent chemical elements is positively correlated with glass formation. When glass stability parameters and Jezica were added to the dataset, no performance improvement was observed, but model complexity decreased significantly. This result is particularly relevant for composition screening, especially in inverse design problems.

Will it form a glass? Tackling glass formation using binary classification

Abstract

Glass formation is one of the most important and fundamental open problems in glass science. Predicting whether a liquid can be easily frozen into a glass appears simple but is far from it. In this communication, we address glass formation in inorganic nonmetallic liquids using binary classification to predict the probability that a given liquid will form a glass under typical laboratory conditions. Using a dataset of more than 50,000 examples, we trained random forest classifiers that achieved ROC-AUC values around 0.89 and PR-AUC close to 0.95 on the holdout dataset (i.e., unseen data). A rigorous model selection routine was employed, including hyperparameter tuning with cross-validation, and four different data treatment routes were evaluated. Using SHAP values, we extracted valuable insights from the trained models that both agree with established knowledge and extend it. For example, we identified that the bandgap energy of the constituent chemical elements is positively correlated with glass formation. When glass stability parameters and Jezica were added to the dataset, no performance improvement was observed, but model complexity decreased significantly. This result is particularly relevant for composition screening, especially in inverse design problems.
Paper Structure (15 sections, 6 equations, 5 figures, 3 tables)

This paper contains 15 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Periodic Table heatmap illustrating the frequency of occurrence for each chemical element in the entire dataset, before the holdout split. Color intensity represents frequency on a logarithmic scale.
  • Figure 2: Distribution of the number of elements in the CHEM dataset.
  • Figure 3: Calibration curves for the glass formation models. The plot compares the mean predicted probability against the actual fraction of positive samples in each bin. The diagonal dotted line represents perfect calibration. The Brier Score (BS) is reported for each model, where lower values indicate better probabilistic calibration.
  • Figure 4: Performance comparison of the glass formation models. (a) Receiver Operating Characteristic curves and (b) Precision-Recall curves.
  • Figure 5: Beeswarm plots of the SHAP values for the 10 most important features of models trained on the (a) CHEM, (b) FEATENG, and (c) FEATENG+GS datasets. Each SHAP value represents the contribution of a feature to the predicted probability of glass formation ($P(\text{glass})$). Features marked with the $\left\lceil \cdot\right\rceil$ operator correspond to absolute descriptors.