Table of Contents
Fetching ...

Predicting Many Crystal Properties via an Adaptive Transformer-based Framework

Haosheng Xu, Dongheng Qian, Jing Wang

TL;DR

It is demonstrated that integrating diverse material information enhances the prediction of complex material properties, paving the way for more accurate and interpretable machine learning models in materials science.

Abstract

Machine learning has revolutionized many fields, including materials science. However, predicting properties of crystalline materials using machine learning faces challenges in input encoding, output versatility, and interpretability. We introduce CrystalBERT, an adaptable transformer-based framework integrating space group, elemental, and unit cell information. This novel structure can seamlessly combine diverse features and accurately predict various physical properties, including topological properties, superconducting transition temperatures, dielectric constants, and more. CrystalBERT provides insightful interpretations of features influencing target properties. Our results indicate that space group and elemental information are crucial for predicting topological and superconducting properties, underscoring their intricate nature. By incorporating these features, we achieve 91\% accuracy in topological classification, surpassing prior studies and identifying previously misclassified materials. This research demonstrates that integrating diverse material information enhances the prediction of complex material properties, paving the way for more accurate and interpretable machine learning models in materials science.

Predicting Many Crystal Properties via an Adaptive Transformer-based Framework

TL;DR

It is demonstrated that integrating diverse material information enhances the prediction of complex material properties, paving the way for more accurate and interpretable machine learning models in materials science.

Abstract

Machine learning has revolutionized many fields, including materials science. However, predicting properties of crystalline materials using machine learning faces challenges in input encoding, output versatility, and interpretability. We introduce CrystalBERT, an adaptable transformer-based framework integrating space group, elemental, and unit cell information. This novel structure can seamlessly combine diverse features and accurately predict various physical properties, including topological properties, superconducting transition temperatures, dielectric constants, and more. CrystalBERT provides insightful interpretations of features influencing target properties. Our results indicate that space group and elemental information are crucial for predicting topological and superconducting properties, underscoring their intricate nature. By incorporating these features, we achieve 91\% accuracy in topological classification, surpassing prior studies and identifying previously misclassified materials. This research demonstrates that integrating diverse material information enhances the prediction of complex material properties, paving the way for more accurate and interpretable machine learning models in materials science.
Paper Structure (10 sections, 8 equations, 5 figures, 6 tables)

This paper contains 10 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: XBERT's structure and t-SNE result. (a) XBERT's structure. A crystalline material's structural and elemental information are encoded into eight tokens. The last token is derived from the output of a CGCNN, which captures unit cell information. These tokens are then fed into the transformer encoder and a single fully connected layer is applied to the output tokens with outputs depending on the specific task. (b) The detailed structure of a transformer encoder layer. We choose “Scaled Dot-Product Attention” with queries, keys and values are from different embeddings of the input vector. (c) 2D representation of the crystal feature vector by t-SNE.
  • Figure 2: Interpretation of the model. (a) The relationship between the mean space group feature value $\overline{s_{sg}}$ and the proportion of topological materials within each space group. Each point represents a specific space group, exhibiting a clear linear correlation. This suggests that the output token reflects the frequency of topological materials within that space group to a significant extent.(b)-(d) Bar charts of weights $\sum_j{\left | w_{ij} \right |}$ assigned to different features for predicting topological property, formation energy and superconducting transition temperature. It is important to note that tokens 1-5 capture space group information, tokens 6-7 capture elemental information, and token 8 captures unit cell information.
  • Figure 3: The visualization diagram of $\Bar{s}_{sg}$. The numbers displayed represent space group numbers. It should be noted that certain space groups are not present due to the absence of corresponding materials in the TopoA database. Within a given crystal system, space groups with higher group numbers generally exhibit a greater number of symmetry operations and consequently possess higher overall symmetry.
  • Figure 4: Efficiency of XBERT. We selected four tasks: (a) topological property, (b) dielectric constant, (c) exfoliation energy, and (d) electronic band gap, and compared the reduction in training loss over time. The red and blue lines correspond to CGCNN and XBERT, respectively. XBERT's training loss converges in fewer epochs compared to CGCNN.
  • Figure 5: Electronic structure and topological property of Ag$_{2}$HgSe$_{4}$Sn. (a) Crystal structure with space group $Pmn2_1$ (No. 31). (b), (c) Band structure without and with SOC, the $Z_2$ invariant is $(1;000)$fu2007. (d) A single Dirac cone surface state.