Table of Contents
Fetching ...

Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction

Ziyi Chen, Yang Yuan, Siming Zheng, Jialong Guo, Sihan Liang, Yangang Wang, Zongguo Wang

TL;DR

This work introduces TransVAE-CSP, a Transformer-Enhanced Variational Autoencoder for crystal structure prediction that jointly reconstructs and generates crystal structures. It advances representation learning by combining adaptive distance expansion with irreducible representations and an encoder built on an equivariant dot-product attention mechanism to capture $E(3)$/SE(3) symmetry. The approach is validated on carbon_24, perov_5, and mp_20, showing superior reconstruction and generation performance relative to several baselines, with dataset-specific RBF choices further enhancing results. The findings demonstrate a robust, distribution-focused CSP framework that enables efficient crystal design and optimization, with future work aimed at closing gaps with diffusion models and enabling composition-conditioned generation.

Abstract

Crystal structure forms the foundation for understanding the physical and chemical properties of materials. Generative models have emerged as a new paradigm in crystal structure prediction(CSP), however, accurately capturing key characteristics of crystal structures, such as periodicity and symmetry, remains a significant challenge. In this paper, we propose a Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction (TransVAE-CSP), who learns the characteristic distribution space of stable materials, enabling both the reconstruction and generation of crystal structures. TransVAE-CSP integrates adaptive distance expansion with irreducible representation to effectively capture the periodicity and symmetry of crystal structures, and the encoder is a transformer network based on an equivariant dot product attention mechanism. Experimental results on the carbon_24, perov_5, and mp_20 datasets demonstrate that TransVAE-CSP outperforms existing methods in structure reconstruction and generation tasks under various modeling metrics, offering a powerful tool for crystal structure design and optimization.

Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction

TL;DR

This work introduces TransVAE-CSP, a Transformer-Enhanced Variational Autoencoder for crystal structure prediction that jointly reconstructs and generates crystal structures. It advances representation learning by combining adaptive distance expansion with irreducible representations and an encoder built on an equivariant dot-product attention mechanism to capture /SE(3) symmetry. The approach is validated on carbon_24, perov_5, and mp_20, showing superior reconstruction and generation performance relative to several baselines, with dataset-specific RBF choices further enhancing results. The findings demonstrate a robust, distribution-focused CSP framework that enables efficient crystal design and optimization, with future work aimed at closing gaps with diffusion models and enabling composition-conditioned generation.

Abstract

Crystal structure forms the foundation for understanding the physical and chemical properties of materials. Generative models have emerged as a new paradigm in crystal structure prediction(CSP), however, accurately capturing key characteristics of crystal structures, such as periodicity and symmetry, remains a significant challenge. In this paper, we propose a Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction (TransVAE-CSP), who learns the characteristic distribution space of stable materials, enabling both the reconstruction and generation of crystal structures. TransVAE-CSP integrates adaptive distance expansion with irreducible representation to effectively capture the periodicity and symmetry of crystal structures, and the encoder is a transformer network based on an equivariant dot product attention mechanism. Experimental results on the carbon_24, perov_5, and mp_20 datasets demonstrate that TransVAE-CSP outperforms existing methods in structure reconstruction and generation tasks under various modeling metrics, offering a powerful tool for crystal structure design and optimization.

Paper Structure

This paper contains 33 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: TransVAE-CSP, based on the Variational Autoencoder (VAE) paradigm, excels in both crystal structure reconstruction and ab initio generation tasks. Experimental results show exceptional performance across evaluation metrics. Compared to previous workxie2021crystal, we have introduced innovations and conducted verifications in crystal structure representation and the encoder network.
  • Figure 2: Overview of TransVAE-CSP. Training VAE model: Given $M=(A,N,X,L)$, it captures feature information through the Embedding layer and the equivariant attention encoder to obtain the latent space variable $z$, which is utilized as a condition to guide the output of both the Predictor and the denoising Decoder. The loss function comprises three parts:$\mathcal{L}_{Pred}, \mathcal{L}_{Dec}, \mathcal{L}_{KL}$. Generation: The variable $z$ is sampled from a multi-dimensional standard normal distribution $\mathcal{N}\sim(0,I)$ and ab initio generation is achieved by the Predictor and the denoising Decoder to produce a crystal structure that aligns with the feature space of the training samples. Note: Our work optimizes the algorithm based on CDVAE, thereby the framework refers to Xie et al.xie2021crystal.
  • Figure 3: The technique of equivariant dot product attention network. This is the core network of Transformer. “DTP” stands for depth-wise tensor product, $\bigoplus$ denotes addition, $\bigotimes$ denotes multiplication.$\sum$ denotes scatter operation.
  • Figure 4: Embedding block. Compared to pervious workliao2023equiformer, our adaptation work is highlighted in red, and RBF refers to the feature vector obtained after the radial basis function is expanded based on the distance.
  • Figure 5: The Comparison of different RBF on Perov_5, Carbon_24, MP_20. Figure a) illustrates the curve generated by training on Perov_5. It is evident that the model utilizing the Bessel RBF exhibits the most effective training results. Similarly, b) Carbon_24 - Hybrid RBF, c) MP_20 - Gaussian RBF.