Table of Contents
Fetching ...

CrystalFramer: Rethinking the Role of Frames for SE(3)-Invariant Crystal Structure Modeling

Yusei Ito, Tatsunori Taniai, Ryo Igarashi, Yoshitaka Ushiku, Kanta Ono

TL;DR

The paper tackles the challenge of SE(3)-invariant crystal structure modeling by rethinking frames from static to dynamic, per-atom alignments that respond to learned interatomic interactions. It introduces CrystalFramer, which integrates dynamic frames into a transformer-based crystal encoder by constructing atom- and layer-specific frames $F_i$ and using frame-projected coordinates to form invariant edge features, including angular GBF encodings. Empirical results on JARVIS, MP, and OQMD demonstrate that CrystalFramer, particularly with max-frame definitions, outperforms static-frame approaches and the baseline Crystalformer across multiple crystal-property tasks, with a lightweight configuration offering substantial speedups. The work highlights the importance of interaction-aware frames for robust SE(3) invariance and suggests avenues for extending dynamic frames to molecules, surfaces, and other domains that involve complex periodicity and symmetry.

Abstract

Crystal structure modeling with graph neural networks is essential for various applications in materials informatics, and capturing SE(3)-invariant geometric features is a fundamental requirement for these networks. A straightforward approach is to model with orientation-standardized structures through structure-aligned coordinate systems, or"frames." However, unlike molecules, determining frames for crystal structures is challenging due to their infinite and highly symmetric nature. In particular, existing methods rely on a statically fixed frame for each structure, determined solely by its structural information, regardless of the task under consideration. Here, we rethink the role of frames, questioning whether such simplistic alignment with the structure is sufficient, and propose the concept of dynamic frames. While accommodating the infinite and symmetric nature of crystals, these frames provide each atom with a dynamic view of its local environment, focusing on actively interacting atoms. We demonstrate this concept by utilizing the attention mechanism in a recent transformer-based crystal encoder, resulting in a new architecture called CrystalFramer. Extensive experiments show that CrystalFramer outperforms conventional frames and existing crystal encoders in various crystal property prediction tasks.

CrystalFramer: Rethinking the Role of Frames for SE(3)-Invariant Crystal Structure Modeling

TL;DR

The paper tackles the challenge of SE(3)-invariant crystal structure modeling by rethinking frames from static to dynamic, per-atom alignments that respond to learned interatomic interactions. It introduces CrystalFramer, which integrates dynamic frames into a transformer-based crystal encoder by constructing atom- and layer-specific frames and using frame-projected coordinates to form invariant edge features, including angular GBF encodings. Empirical results on JARVIS, MP, and OQMD demonstrate that CrystalFramer, particularly with max-frame definitions, outperforms static-frame approaches and the baseline Crystalformer across multiple crystal-property tasks, with a lightweight configuration offering substantial speedups. The work highlights the importance of interaction-aware frames for robust SE(3) invariance and suggests avenues for extending dynamic frames to molecules, surfaces, and other domains that involve complex periodicity and symmetry.

Abstract

Crystal structure modeling with graph neural networks is essential for various applications in materials informatics, and capturing SE(3)-invariant geometric features is a fundamental requirement for these networks. A straightforward approach is to model with orientation-standardized structures through structure-aligned coordinate systems, or"frames." However, unlike molecules, determining frames for crystal structures is challenging due to their infinite and highly symmetric nature. In particular, existing methods rely on a statically fixed frame for each structure, determined solely by its structural information, regardless of the task under consideration. Here, we rethink the role of frames, questioning whether such simplistic alignment with the structure is sufficient, and propose the concept of dynamic frames. While accommodating the infinite and symmetric nature of crystals, these frames provide each atom with a dynamic view of its local environment, focusing on actively interacting atoms. We demonstrate this concept by utilizing the attention mechanism in a recent transformer-based crystal encoder, resulting in a new architecture called CrystalFramer. Extensive experiments show that CrystalFramer outperforms conventional frames and existing crystal encoders in various crystal property prediction tasks.

Paper Structure

This paper contains 25 sections, 6 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Conventional static frame and proposed dynamic frames. Conventional frames are determined statically to align with the structure, ensuring consistency under rotation and providing a canonical global representation of the structure. This consistency is schematically illustrated by the curved arrows. By contrast, the proposed dynamic frames are determined for each atom in each message-passing layer, by considering the local dynamic environment around that atom in that layer.
  • Figure 2: CrystalFramer architecture. Dynamic frame construction and frame-based invariant edge features (highlighted in red) are introduced to a transformer for crystals taniai2024crystalformer.
  • Figure 3: Frame visualizations. Conventional PCA and lattice frames provide a global coordinate system based solely on the structure. The proposed dynamic frames extract different structural information for each atom and layer using dynamic attention weights, shown as varying transparency.
  • Figure A1: Conventional cell (green) and Niggli reduced cell (blue) for a face-centered cube.
  • Figure A2: Evolution of dynamic frames during training. We visualize the weighted PCA frames and max frames using model checkpoints taken every 200 epochs, starting from epoch 100 until 2000. Frames from earlier checkpoints are overlaid with higher transparency. Notably, the max frames stabilize more quickly than the weighted PCA frames.
  • ...and 1 more figures