Table of Contents
Fetching ...

Wyckoff Transformer: Generation of Symmetric Crystals

Nikita Kazeev, Wei Nong, Ignat Romanov, Ruiming Zhu, Andrey Ustyuzhanin, Shuya Yamazaki, Kedar Hippalgaonkar

TL;DR

Wyckoff Transformer (WyFormer) addresses the challenge of generating stable, symmetry-valid crystals by conditioning generation on space-group symmetry and encoding Wyckoff positions with a site-symmetry–aware, permutation-invariant tokenization. The approach leverages a Transformer encoder and a spherical-harmonics descriptor to produce an autoregressive, discrete representation that is invariant to coset choices and can predict properties without full atomic coordinates. Empirical results show state-of-the-art symmetry-conditioned generation, strong novelty and distribution fidelity, and competitive property predictions against methods using full structures, with substantial gains in inference speed. This framework enables rapid exploration of symmetry-constrained crystal spaces and holds promise for accelerating materials discovery while delivering physically meaningful structures and property estimates.

Abstract

Crystal symmetry plays a fundamental role in determining its physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. It achieves this by using Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer encoder and an absence of positional encoding. Extensive experimentation demonstrates WyFormer's compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed.

Wyckoff Transformer: Generation of Symmetric Crystals

TL;DR

Wyckoff Transformer (WyFormer) addresses the challenge of generating stable, symmetry-valid crystals by conditioning generation on space-group symmetry and encoding Wyckoff positions with a site-symmetry–aware, permutation-invariant tokenization. The approach leverages a Transformer encoder and a spherical-harmonics descriptor to produce an autoregressive, discrete representation that is invariant to coset choices and can predict properties without full atomic coordinates. Empirical results show state-of-the-art symmetry-conditioned generation, strong novelty and distribution fidelity, and competitive property predictions against methods using full structures, with substantial gains in inference speed. This framework enables rapid exploration of symmetry-constrained crystal spaces and holds promise for accelerating materials discovery while delivering physically meaningful structures and property estimates.

Abstract

Crystal symmetry plays a fundamental role in determining its physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. However, this is often inadequately addressed by existing generative models, making the consistent generation of stable and symmetrically valid crystal structures a significant challenge. We introduce WyFormer, a generative model that directly tackles this by formally conditioning on space group symmetry. It achieves this by using Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer encoder and an absence of positional encoding. Extensive experimentation demonstrates WyFormer's compelling combination of attributes: it achieves best-in-class symmetry-conditioned generation, incorporates a physics-motivated inductive bias, produces structures with competitive stability, predicts material properties with competitive accuracy even without atomic coordinates, and exhibits unparalleled inference speed.

Paper Structure

This paper contains 43 sections, 1 equation, 16 figures, 14 tables, 3 algorithms.

Figures (16)

  • Figure 1: A toy 2D crystal Wyckoff_Set_Regression. It contains 4 mirror lines, and one rotation center. There are four Wyckoff positions, illustrated by shading. Magenta is the Wyckoff position that is invariant under all the transformations, it only contains a single point; red and yellow lie on the mirror lines, and teal is only invariant under the identity transformation and occupies the rest of the space. Markers of the corresponding colors show one of the possible locations of an atom belonging to the corresponding Wyckoff position.
  • Figure 2: Distribution of space groups in MP-20 dataset xie2021crystal and generated samples. 10 space groups most frequent in MP-20 are labeled, 98% of MP-20 structures belong to symmetry groups other than P1. Plot design by levy2024symmcd. The comparison of the distribution of generated samples' space groups to the ground truth distribution is presented in Table \ref{['tab:evaluation-symmetry']}, column Space Group $\chi^2$.
  • Figure 3: Two equivalent Wyckoff representations of SrTiO_3 https://next-gen.materialsproject.org/materials/mp-4651, depending on the lattice center choice: [Ti, (m-3m, 0)], [Sr, (m-3m, 1)], [O, (4/mm.m, 1)] [Ti, (m-3m, 1)], [Sr, (m-3m, 0)], [O, (4/mm.m, 0)]
  • Figure 4: An example of structure tokenization, TmMgHg2 https://next-gen.materialsproject.org/materials/mp-865981
  • Figure 5: Model training pipeline. The crystal is converted into a token sequence where the first token is the space group number and then token triplets in the order atom, site, symmetry and enumeration. Then the triplets are randomly shuffled.Randomly sample the number of fully known Wyckoff positions and the part of the next triplet to be predicted; mask unknown tokens, remove unknown Wyckoff positions.Embed the tokens using simple lookup tables; for each Wyckoff positions concatenate tokens corresponding to it in the embedding dimension.A linear layer mixes the features to provide homogeneous input to multiple attention heads.The sequence is passed through the Transformer Encoder.An MLP is applied to the last token of the output sequence.The loss is cross entropy of the prediction and the true value of the token being predicted.
  • ...and 11 more figures