Table of Contents
Fetching ...

A Materials Map Integrating Experimental and Computational Data via Graph-Based Machine Learning for Enhanced Materials Discovery

Yusuke Hashimoto, Xue Jia, Hao Li, Takaaki Tomai

TL;DR

This work presents a framework to bridge experimental and computational materials data by constructing materials maps with the MatDeepLearn (MDL) graph-based platform. Using an MPNN-driven graph representation, the authors visualize relationships between structural features and thermo­electric performance via $zT$, revealing two distinct branches that reflect structural complexity. Although MPNN yields well-structured maps, its predictive accuracy for material properties does not surpass other graph architectures, highlighting a trade-off between interpretability of structure and predictive performance. The resulting maps, coupled with interactive visualization and clustering analyses, offer a practical tool to guide experimental discovery and targeted synthesis, with future expansion to more properties and materials systems anticipated.

Abstract

Materials informatics (MI), emerging from the integration of materials science and data science, is expected to significantly accelerate material development and discovery. The data used in MI are derived from both computational and experimental studies; however, their integration remains challenging. In our previous study, we reported the integration of these datasets by applying a machine learning model that is trained on the experimental dataset to the compositional data stored in the computational database. In this study, we use the obtained datasets to construct materials maps, which visualize the relationships between material properties and structural features, aiming to support experimental researchers. The materials map is constructed using the MatDeepLearn (MDL) framework, which implements materials property prediction using graph-based representations of material structure and deep learning modeling. Through statistical analysis, we find that the MDL framework using the message passing neural network (MPNN) architecture efficiently extracts features reflecting the structural complexity of materials. Moreover, we find that this advantage does not necessarily translate into improved accuracy in the prediction of material properties. We attribute this unexpected outcome to the high learning performance inherent in MPNN, which can contribute to the structuring of data points within the materials map.

A Materials Map Integrating Experimental and Computational Data via Graph-Based Machine Learning for Enhanced Materials Discovery

TL;DR

This work presents a framework to bridge experimental and computational materials data by constructing materials maps with the MatDeepLearn (MDL) graph-based platform. Using an MPNN-driven graph representation, the authors visualize relationships between structural features and thermo­electric performance via , revealing two distinct branches that reflect structural complexity. Although MPNN yields well-structured maps, its predictive accuracy for material properties does not surpass other graph architectures, highlighting a trade-off between interpretability of structure and predictive performance. The resulting maps, coupled with interactive visualization and clustering analyses, offer a practical tool to guide experimental discovery and targeted synthesis, with future expansion to more properties and materials systems anticipated.

Abstract

Materials informatics (MI), emerging from the integration of materials science and data science, is expected to significantly accelerate material development and discovery. The data used in MI are derived from both computational and experimental studies; however, their integration remains challenging. In our previous study, we reported the integration of these datasets by applying a machine learning model that is trained on the experimental dataset to the compositional data stored in the computational database. In this study, we use the obtained datasets to construct materials maps, which visualize the relationships between material properties and structural features, aiming to support experimental researchers. The materials map is constructed using the MatDeepLearn (MDL) framework, which implements materials property prediction using graph-based representations of material structure and deep learning modeling. Through statistical analysis, we find that the MDL framework using the message passing neural network (MPNN) architecture efficiently extracts features reflecting the structural complexity of materials. Moreover, we find that this advantage does not necessarily translate into improved accuracy in the prediction of material properties. We attribute this unexpected outcome to the high learning performance inherent in MPNN, which can contribute to the structuring of data points within the materials map.

Paper Structure

This paper contains 23 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Schematic representation of the data flow and data analysis processes employed in this study.
  • Figure 2: A materials map generated by MDL using MPNN architecture for graph-based modeling of materials properties. The color of each data point represents the predicted-experimental $zT$ values. The position of each data point reflects the structural properties of materials extracted by the model trained by MDL. Two branches spreading laterally are labeled BR1 and BR2 in the map. The structures of Bi$_2$Te$_3$ and Ga(Ag$_{3}$Se$_{2}$)$_{3}$ are demonstrated to illustrate the different complexities among the materials included in the two branches.
  • Figure 3: Material maps generated from the same data used in Figure \ref{['fig:main']}, but color-coded by (a) energy per atom, (b) number of elements, (c) number of sites, and (d) volume.
  • Figure 4: (a) A material map colored by cluster numbers obtained via $k$-means clustering with $k = 10$. (b-k) The elemental compositional analysis results for materials included in each cluster.
  • Figure 5: Comparison of material maps generated by MDL using different graph-based architectures: (a) CGCNN, (b) MEGNet, (c) GCN, and (d) SchNet. (e) The distributions of data points in the maps shown in (a)-(d) are compared by the distribution analysis of NND of each data point by KDE.
  • ...and 3 more figures