Accelerating Material Property Prediction using Generically Complete Isometry Invariants
Jonathan Balasingham, Viktor Zamaraev, Vitaliy Kurlin
TL;DR
This work tackles crystal property prediction by representing periodic crystals with the Pointwise Distance Distribution ($PDD$), an isometry invariant that is generically complete and continuous under Earth Mover's Distance ($EMD$). It introduces the Periodic Set Transformer (PST), which integrates $PDD$ with atom-type composition via a $PDD$ Encoding, using a $Q$-$K$-$V$ attention mechanism and weighted pooling. On Materials Project and Jarvis-DFT datasets, the PST achieves accuracy on par with or better than state-of-the-art graph-based methods while offering substantially faster training and inference. The approach demonstrates that a complete, invariant structural representation can be effectively combined with composition information to scale property prediction for unbounded periodic crystals.
Abstract
Periodic material or crystal property prediction using machine learning has grown popular in recent years as it provides a computationally efficient replacement for classical simulation methods. A crucial first step for any of these algorithms is the representation used for a periodic crystal. While similar objects like molecules and proteins have a finite number of atoms and their representation can be built based upon a finite point cloud interpretation, periodic crystals are unbounded in size, making their representation more challenging. In the present work, we adapt the Pointwise Distance Distribution (PDD), a continuous and generically complete isometry invariant for periodic point sets, as a representation for our learning algorithm. The PDD distinguished all (more than 660 thousand) periodic crystals in the Cambridge Structural Database as purely periodic sets of points without atomic types. We develop a transformer model with a modified self-attention mechanism that combines PDD with compositional information via a spatial encoding method. This model is tested on the crystals of the Materials Project and Jarvis-DFT databases and shown to produce accuracy on par with state-of-the-art methods while being several times faster in both training and prediction time.
