Table of Contents
Fetching ...

Learning Thermoelectric Transport from Crystal Structures via Multiscale Graph Neural Network

Yuxuan Zeng, Wei Cao, Yijing Zuo, Fang Lyu, Wenhao Xie, Tan Peng, Yue Hou, Ling Miao, Ziyu Wang, Jing Shi

TL;DR

This work introduces TECSA-GNN, a multiscale graph neural network that encodes global composition descriptors together with atomic, bond, and angular crystal-structure information to predict thermoelectric transport descriptors $S$, $\sigma/\tau$, and $\kappa_e/\tau$. The model achieves state-of-the-art accuracy on a DFT-derived TE transport dataset and demonstrates strong extrapolation and interpretability through global- and atom-level analyses, including GNNExplainer and partial dependence methods. By integrating with ab initio calculations, TECSA-GNN enables efficient high-throughput screening of candidate TE materials and offers mechanistic insights into structure–property relationships, such as the role of band gap and orbital localization. Despite limitations like a fixed relaxation time and isotropic tensor reduction, the framework provides a scalable path toward accelerated discovery and deeper understanding of electronic transport in crystals.

Abstract

Graph neural networks (GNNs) are designed to extract latent patterns from graph-structured data, making them particularly well suited for crystal representation learning. Here, we propose a GNN model tailored for estimating electronic transport coefficients in inorganic thermoelectric crystals. The model encodes crystal structures and physicochemical properties in a multiscale manner, encompassing global, atomic, bond, and angular levels. It achieves state-of-the-art performance on benchmark datasets with remarkable extrapolative capability. By combining the proposed GNN with \textit{ab initio} calculations, we successfully identify compounds exhibiting outstanding electronic transport properties and further perform interpretability analyses from both global and atomic perspectives, tracing the origins of their distinct transport behaviors. Interestingly, the decision process of the model naturally reveals underlying physical patterns, offering new insights into computer-assisted materials design.

Learning Thermoelectric Transport from Crystal Structures via Multiscale Graph Neural Network

TL;DR

This work introduces TECSA-GNN, a multiscale graph neural network that encodes global composition descriptors together with atomic, bond, and angular crystal-structure information to predict thermoelectric transport descriptors , , and . The model achieves state-of-the-art accuracy on a DFT-derived TE transport dataset and demonstrates strong extrapolation and interpretability through global- and atom-level analyses, including GNNExplainer and partial dependence methods. By integrating with ab initio calculations, TECSA-GNN enables efficient high-throughput screening of candidate TE materials and offers mechanistic insights into structure–property relationships, such as the role of band gap and orbital localization. Despite limitations like a fixed relaxation time and isotropic tensor reduction, the framework provides a scalable path toward accelerated discovery and deeper understanding of electronic transport in crystals.

Abstract

Graph neural networks (GNNs) are designed to extract latent patterns from graph-structured data, making them particularly well suited for crystal representation learning. Here, we propose a GNN model tailored for estimating electronic transport coefficients in inorganic thermoelectric crystals. The model encodes crystal structures and physicochemical properties in a multiscale manner, encompassing global, atomic, bond, and angular levels. It achieves state-of-the-art performance on benchmark datasets with remarkable extrapolative capability. By combining the proposed GNN with \textit{ab initio} calculations, we successfully identify compounds exhibiting outstanding electronic transport properties and further perform interpretability analyses from both global and atomic perspectives, tracing the origins of their distinct transport behaviors. Interestingly, the decision process of the model naturally reveals underlying physical patterns, offering new insights into computer-assisted materials design.

Paper Structure

This paper contains 19 sections, 21 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: Graph representation of crystal structures. Taking CuSe$_2$ as an illustration, the crystal structure can be mathematically represented by a set of hierarchical vectorial descriptors at multiple scales, corresponding respectively to the global statistical properties, atomic sites, chemical bonds, and bond angles.
  • Figure 2: The proposed GNN framework for crystal structure representation learning. (a) The overall architecture of the TECSA-GNN. Distinct colours are employed to indicate feature vectors at different scales. The superscript ${(0)}$ designates the embedding representation subsequent to dimensional alignment by means of an MLP projection, whereas "$N$" corresponds to the embedding yielded upon passage through the $N$-th graph convolutional module (GConv #$N$). (b) Detailed schematic of an individual graph convolutional module. In this representation, both "Concatenation" and "$\oplus$" signify the concatenation of embeddings; "$\sum$" denotes element-wise summation realised via residual connections; and "$\otimes$" indicates element-wise multiplication.
  • Figure 3: Error evaluation for the TECSA-GNN model. (a)-(c) The parity plots for $S$, $\log(\sigma/\tau)$, and $\log(\kappa_{\rm e}/\tau)$, where red circles denote test samples and blue squares denote training samples. The $x$-axis represents the DFT-calculated values, and the $y$-axis shows the TECSA-GNN estimations. (d)-(f) The MSE loss curves for the training of $S$, $\log(\sigma/\tau)$, and $\log(\kappa_{\rm e}/\tau)$, with blue and red lines representing the training and test losses, respectively.
  • Figure 4: 10-fold cross-validation performance of the TECSA-GNN. (a), (c), and (e) show the dependence of $S$, $\log(\sigma/\tau)$, and $\log(\kappa_{\rm e}/\tau)$ on carrier concentration, with $T=\left\{300,600,900,1200\right\}~{\rm K}$. (b), (d), and (f) depict the dependence of $S$, $\log(\sigma/\tau)$, and $\log(\kappa_{\rm e}/\tau)$ on temperature, with $n=\left\{10^{16},10^{17},10^{18},10^{19},10^{20}\right\}~{\rm cm}^{-3}$. Circular markers denote $n$-type samples, while square markers denote $p$-type. Error bars represent the mean and standard deviation of the MAE across the 10 folds.
  • Figure 5: $t$-SNE visualization of sample embeddings across different crystal systems. (a)-(g) Each colour denotes one of the seven crystal systems in the two-dimensional $t$-SNE plane, with grey points indicating samples from other systems. (h) Colour mapping of the Seebeck coefficient $S$ illustrates both its magnitude and spatial distribution.
  • ...and 10 more figures