Table of Contents
Fetching ...

Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study

Sadman Sadeed Omee, Nihang Fu, Rongzhi Dong, Ming Hu, Jianjun Hu

TL;DR

This work addresses the gap between typical i.i.d. evaluations and real-world OOD generalization for inorganic materials property prediction by introducing a dedicated OOD benchmark using structure-based GNNs. It systematically assesses eight GNNs across three MatBench datasets under five distinct OOD target schemes with 50-fold cross-validation, revealing a generalization gap and no single model that dominates all OOD scenarios. Notably, CGCNN, ALIGNN, and DeeperGATGNN show more robust OOD performance in several cases, while latent-space analyses (e.g., t-SNE) shed light on why certain architectures cope better with OOD data. The findings highlight the need for domain adaptation or meta-learning approaches to achieve reliable OOD predictions and set the stage for more robust, real-world materials discovery tools, with practical implications for screening and designing novel materials.

Abstract

In real-world material research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set distribution. Traditional performance evaluation of materials property prediction models through random splitting of the dataset frequently results in artificially high performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN's significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN), and provide insights to improve their performance.

Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study

TL;DR

This work addresses the gap between typical i.i.d. evaluations and real-world OOD generalization for inorganic materials property prediction by introducing a dedicated OOD benchmark using structure-based GNNs. It systematically assesses eight GNNs across three MatBench datasets under five distinct OOD target schemes with 50-fold cross-validation, revealing a generalization gap and no single model that dominates all OOD scenarios. Notably, CGCNN, ALIGNN, and DeeperGATGNN show more robust OOD performance in several cases, while latent-space analyses (e.g., t-SNE) shed light on why certain architectures cope better with OOD data. The findings highlight the need for domain adaptation or meta-learning approaches to achieve reliable OOD predictions and set the stage for more robust, real-world materials discovery tools, with practical implications for screening and designing novel materials.

Abstract

In real-world material research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set distribution. Traditional performance evaluation of materials property prediction models through random splitting of the dataset frequently results in artificially high performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN's significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN), and provide insights to improve their performance.
Paper Structure (20 sections, 10 equations, 15 figures, 5 tables)

This paper contains 20 sections, 10 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: The overall framework and workflow of our OOD materials benchmark. First, we generate OOD test sets for the three datasets chosen, where we propose five different methods to split each dataset into 50 folds, ensuring the test set varies in distribution from the training set in each fold. Next, we perform preprocessing steps such as input representation, data scaling, etc. for the GNNs. Subsequently, we train the GNN models and compile the test set results. After that, we evaluate the performance over the 50 folds for each OOD target generation method. We conduct additional analyses on the obtained results, including investigating the physical latent spaces of the GNN models to understand their characteristics in predicting properties of OOD materials.
  • Figure 2: Distribution of standard cross-validation (CV) test set and five OOD test sets using various target generation methods for the dielectric dataset. (a) 50-fold CV (with random splitting) of the whole dielectric dataset with 4,764 samples represented by cross symbols with 50 different colors. (b) Leave-one-cluster-out target (LOCO) clusters. (c) In SparseXsingle, 50 test samples are represented by cross symbols with 50 different colors, and grey points represent the remaining samples. (d) In SparseYsingle, 50 test samples are represented by cross symbols with 50 different colors, and grey points represent the remaining samples. (e) SparseXcluster displays 50 test clusters represented by cross symbols with 50 different colors, and grey points represent the remaining samples. (f) SparseYcluster displays 50 test clusters represented by cross symbols with 50 different colors, and grey points represent the remaining samples.
  • Figure 3: LOCO
  • Figure 4: SparseXcluster
  • Figure 5: SparseYcluster
  • ...and 10 more figures