Table of Contents
Fetching ...

Graph Learning Metallic Glass Discovery from Wikipedia

K. -C. Ouyang, S. -Y. Zhang, S. -L. Liu, J. Tian, Y. -H. Li, H. Tong, H. -Y. Bai, W. -H. Wang, Y. -C. Hu

TL;DR

This study proposes a new paradigm to harvesting new amorphous materials and beyond with artificial intelligence by employing Wikipedia embeddings from different languages to assess the capability of natural languages in materials design.

Abstract

Synthesizing new materials efficiently is highly demanded in various research fields. However, this process is usually slow and expensive, especially for metallic glasses, whose formation strongly depends on the optimal combinations of multiple elements to resist crystallization. This constraint renders only several thousands of candidates explored in the vast material space since 1960. Recently, data-driven approaches armed by advanced machine learning techniques provided alternative routes for intelligent materials design. Due to data scarcity and immature material encoding, the conventional tabular data is usually mined by statistical learning algorithms, giving limited model predictability and generalizability. Here, we propose sophisticated data learning from material network representations. The node elements are encoded from the Wikipedia by a language model. Graph neural networks with versatile architectures are designed to serve as recommendation systems to explore hidden relationships among materials. By employing Wikipedia embeddings from different languages, we assess the capability of natural languages in materials design. Our study proposes a new paradigm to harvesting new amorphous materials and beyond with artificial intelligence.

Graph Learning Metallic Glass Discovery from Wikipedia

TL;DR

This study proposes a new paradigm to harvesting new amorphous materials and beyond with artificial intelligence by employing Wikipedia embeddings from different languages to assess the capability of natural languages in materials design.

Abstract

Synthesizing new materials efficiently is highly demanded in various research fields. However, this process is usually slow and expensive, especially for metallic glasses, whose formation strongly depends on the optimal combinations of multiple elements to resist crystallization. This constraint renders only several thousands of candidates explored in the vast material space since 1960. Recently, data-driven approaches armed by advanced machine learning techniques provided alternative routes for intelligent materials design. Due to data scarcity and immature material encoding, the conventional tabular data is usually mined by statistical learning algorithms, giving limited model predictability and generalizability. Here, we propose sophisticated data learning from material network representations. The node elements are encoded from the Wikipedia by a language model. Graph neural networks with versatile architectures are designed to serve as recommendation systems to explore hidden relationships among materials. By employing Wikipedia embeddings from different languages, we assess the capability of natural languages in materials design. Our study proposes a new paradigm to harvesting new amorphous materials and beyond with artificial intelligence.

Paper Structure

This paper contains 21 sections, 15 equations, 7 figures.

Figures (7)

  • Figure 1: Material encoding strategies.a, Conventional material representation by physical properties of elements and alloys collected manually. There are generally 47 elemental features and 4 alloy features. The right panel shows the data for element Cu as an example. b, Encoding materials by the Wikipedia. The content of a specific Wikipedia page and its link to other pages are processed by the Wikipedia2Vec model. The right panel shows the feature matrix of Cu. c, The hierarchical clustering dendrogram of element embeddings from Wikipedia. d, Stacked bar plot for the number of MGs that an element is involved in. The upper segment (dotted pattern) represents ternary systems, while the lower segment ($-45^\circ$ diagonal hatches) dictates binary systems. Some elements only show up in ternary MGs. The periodic groups of the corresponding elements color the bars.
  • Figure 2: Correlation analysis of Wikipedia embeddings.a, Heatmap of the 100-dimensional Wikipedia embeddings for the 47 elements. The hierarchical clustering dendrogram at the top demonstrates the weak correlation between components of the embeddings. Only part of the labels are shown for clarity. b-d, Dimensionality reduction of element embeddings using PCA ( b), t-SNE ( c) and UMAP ( d), respectively. The elements are illustrated by the top-2 principal components. The data is scattered in all panels, corroborating the importance of each component in the Wikipedia embeddings. A cluster of the rare earth elements is present.
  • Figure 3: Graph-based recommendation system for MGs. The input elemental Wikipedia embeddings are mapped to node representations by MLP layers. By embeddings post-processing (e.g. by Transformer) and advanced message passing, different GNN models (e.g. TransGNN) are designed to generated the final embeddings. A recommendation system is thus built to score the appearance of a link (binary system) or a triangle (ternary system) in the material networks. (PD: inner product; HDM: Hadamard product)
  • Figure 4: Visualization of the recommendation scores for MG prediction. a, Two-dimensional heatmap of the scores for binary MG recommendation by TransGNN. The red squares mark those validated by experiments previously. b, Three-dimensional scatter plot of the scores for ternary MG recommendation by TransGNN. The red circles are exemplified experimental validations from 16 elements for better clarity.
  • Figure 5: Performance of the B2B recommendation system.a, Planar visualization of the binary network with a link recommendation highlighted. b, Model performance evaluation for three GNN architectures over the English Wikipedia embeddings. c-e, Metrics for multi-lingual recommendation systems with GCN-PD ( c), NGCF-PD ( d), and TransGNN-HDM ( e).
  • ...and 2 more figures