Table of Contents
Fetching ...

On the Robustness of Machine Learning Models in Predicting Thermodynamic Properties: a Case of Searching for New Quasicrystal Approximants

Fedor S. Avilov, Roman A. Eremin, Semen A. Budennyy, Innokentiy S. Humonen

TL;DR

This work composed a series of nested intermetallic approximants of quasicrystals datasets and trained various machine learning models on them correspondingly, and showed the advantage of pre-training and proposed a simple yet effective trick of sequential training to increase stability.

Abstract

Despite an artificial intelligence-assisted modeling of disordered crystals is a widely used and well-tried method of new materials design, the issues of its robustness, reliability, and stability are still not resolved and even not discussed enough. To highlight it, in this work we composed a series of nested intermetallic approximants of quasicrystals datasets and trained various machine learning models on them correspondingly. Our qualitative and, what is more important, quantitative assessment of the difference in the predictions clearly shows that different reasonable changes in the training sample can lead to the completely different set of the predicted potentially new materials. We also showed the advantage of pre-training and proposed a simple yet effective trick of sequential training to increase stability.

On the Robustness of Machine Learning Models in Predicting Thermodynamic Properties: a Case of Searching for New Quasicrystal Approximants

TL;DR

This work composed a series of nested intermetallic approximants of quasicrystals datasets and trained various machine learning models on them correspondingly, and showed the advantage of pre-training and proposed a simple yet effective trick of sequential training to increase stability.

Abstract

Despite an artificial intelligence-assisted modeling of disordered crystals is a widely used and well-tried method of new materials design, the issues of its robustness, reliability, and stability are still not resolved and even not discussed enough. To highlight it, in this work we composed a series of nested intermetallic approximants of quasicrystals datasets and trained various machine learning models on them correspondingly. Our qualitative and, what is more important, quantitative assessment of the difference in the predictions clearly shows that different reasonable changes in the training sample can lead to the completely different set of the predicted potentially new materials. We also showed the advantage of pre-training and proposed a simple yet effective trick of sequential training to increase stability.

Paper Structure

This paper contains 20 sections, 4 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Number of systems with a given number of defects on the A5 site of the original crystal structure. The subdataset corresponding to the limit of $x$ comprises all systems with the number of defects on A5 less or equal to $x$
  • Figure 2: Histogram of energies above the hull histogram for the Sc-Pd dataset
  • Figure 3: Test RMSEs of the gradient boosting models
  • Figure 4: Box plots of the obtained test RMSEs for different classical models on different limit=$x$ subdatasets for each dopant, eV/atom
  • Figure 5: Random Forest Test RMSE, eV/atom
  • ...and 10 more figures