Table of Contents
Fetching ...

Accelerating Material Property Prediction using Generically Complete Isometry Invariants

Jonathan Balasingham, Viktor Zamaraev, Vitaliy Kurlin

TL;DR

This work tackles crystal property prediction by representing periodic crystals with the Pointwise Distance Distribution ($PDD$), an isometry invariant that is generically complete and continuous under Earth Mover's Distance ($EMD$). It introduces the Periodic Set Transformer (PST), which integrates $PDD$ with atom-type composition via a $PDD$ Encoding, using a $Q$-$K$-$V$ attention mechanism and weighted pooling. On Materials Project and Jarvis-DFT datasets, the PST achieves accuracy on par with or better than state-of-the-art graph-based methods while offering substantially faster training and inference. The approach demonstrates that a complete, invariant structural representation can be effectively combined with composition information to scale property prediction for unbounded periodic crystals.

Abstract

Periodic material or crystal property prediction using machine learning has grown popular in recent years as it provides a computationally efficient replacement for classical simulation methods. A crucial first step for any of these algorithms is the representation used for a periodic crystal. While similar objects like molecules and proteins have a finite number of atoms and their representation can be built based upon a finite point cloud interpretation, periodic crystals are unbounded in size, making their representation more challenging. In the present work, we adapt the Pointwise Distance Distribution (PDD), a continuous and generically complete isometry invariant for periodic point sets, as a representation for our learning algorithm. The PDD distinguished all (more than 660 thousand) periodic crystals in the Cambridge Structural Database as purely periodic sets of points without atomic types. We develop a transformer model with a modified self-attention mechanism that combines PDD with compositional information via a spatial encoding method. This model is tested on the crystals of the Materials Project and Jarvis-DFT databases and shown to produce accuracy on par with state-of-the-art methods while being several times faster in both training and prediction time.

Accelerating Material Property Prediction using Generically Complete Isometry Invariants

TL;DR

This work tackles crystal property prediction by representing periodic crystals with the Pointwise Distance Distribution (), an isometry invariant that is generically complete and continuous under Earth Mover's Distance (). It introduces the Periodic Set Transformer (PST), which integrates with atom-type composition via a Encoding, using a -- attention mechanism and weighted pooling. On Materials Project and Jarvis-DFT datasets, the PST achieves accuracy on par with or better than state-of-the-art graph-based methods while offering substantially faster training and inference. The approach demonstrates that a complete, invariant structural representation can be effectively combined with composition information to scale property prediction for unbounded periodic crystals.

Abstract

Periodic material or crystal property prediction using machine learning has grown popular in recent years as it provides a computationally efficient replacement for classical simulation methods. A crucial first step for any of these algorithms is the representation used for a periodic crystal. While similar objects like molecules and proteins have a finite number of atoms and their representation can be built based upon a finite point cloud interpretation, periodic crystals are unbounded in size, making their representation more challenging. In the present work, we adapt the Pointwise Distance Distribution (PDD), a continuous and generically complete isometry invariant for periodic point sets, as a representation for our learning algorithm. The PDD distinguished all (more than 660 thousand) periodic crystals in the Cambridge Structural Database as purely periodic sets of points without atomic types. We develop a transformer model with a modified self-attention mechanism that combines PDD with compositional information via a spatial encoding method. This model is tested on the crystals of the Materials Project and Jarvis-DFT databases and shown to produce accuracy on par with state-of-the-art methods while being several times faster in both training and prediction time.
Paper Structure (21 sections, 1 theorem, 16 equations, 6 figures, 14 tables)

This paper contains 21 sections, 1 theorem, 16 equations, 6 figures, 14 tables.

Key Result

Lemma A.1

Let $A$ and $B$ be weighted multisets each containing elements from the set $S = \{{\bm{x}}_1,\ldots {\bm{x}}_n\}$. Each element ${\bm{x}}_i \in \mathbb{R}^{1 \times n}$ occurs with multiplicity $m_i^{(a)} \in \mathbb{N}^{+}$ and $m_i^{(b)} \in \mathbb{N}^{+}$ in $A$ and $B$ respectively. Each eleme

Figures (6)

  • Figure 1: Classification of geometric descriptors for periodic crystals based on the properties possessed. Cell parameters consist of the unit cell lengths and angles, but this is ambiguous as there are an infinite number of unit cells. Space group is a label defined by the symmetry relations the crystal exhibits, but is sensitive to atomic perturbations. Equivariant GNNs cannot be used to distinguish periodic structures. PDF needs additional smoothing to retain continuity, introducing more parameters. The PDD is invariant, generically complete, and continuous under the EMD.
  • Figure 2: Overview of the architecture of the Periodic Set Transformer. PDD encoding is used to combine the structural information in the PDD with atomic types. The weights of the PDD are incorporated in the attention mechanism and during the pooling of the embeddings to define the multiplicity of the input set.
  • Figure 3: The unit cell of Lutetium-Silicon; Silicon is colored in teal and Lutetium in magenta.
  • Figure 4: Multi-dimensional scaling projection Borg2005_mds on to $\mathbb{R}^2$ and $\mathbb{R}^3$ for the pairwise distances of crystals between each other. Subfigure (a) projects three types of crystals using distances created by using the Earth Mover's Distance between PDDs for $k=10$. Subfigure (b) is created using the MDS projection of pairwise distances between PDDs at $k=100$ for one hundred random samples from the T2 crystals colored by lattice energy measured in kJ/mol.
  • Figure 5: (a) The distribution of the lattice energies of the T2, S2 and P1 crystals. (b) The predictions of T2, S2, and P1 compared to the ground truth lattice energies in kJ/mol.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 2.1: Periodic Point Set
  • Definition 2.2: Pointwise Distance Distribution
  • Lemma A.1
  • proof