Table of Contents
Fetching ...

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

Kiyoung Seong, Sungsoo Ahn, Sehui Han, Changyoung Park

TL;DR

To enable multimodal flow in a standard transformer model, a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation is introduced, injecting strong compositional and crystallographic priors without explicit structural templates.

Abstract

Crystal modeling spans a family of conditional and unconditional generation tasks across different modalities, including crystal structure prediction (CSP) and \emph{de novo} generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across different generation tasks. To address this limitation, we propose \emph{Multimodal Crystal Flow (MCFlow)}, a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting strong compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that MCFlow achieves competitive performance against task-specific baselines across multiple crystal generation tasks.

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

TL;DR

To enable multimodal flow in a standard transformer model, a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation is introduced, injecting strong compositional and crystallographic priors without explicit structural templates.

Abstract

Crystal modeling spans a family of conditional and unconditional generation tasks across different modalities, including crystal structure prediction (CSP) and \emph{de novo} generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across different generation tasks. To address this limitation, we propose \emph{Multimodal Crystal Flow (MCFlow)}, a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting strong compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that MCFlow achieves competitive performance against task-specific baselines across multiple crystal generation tasks.
Paper Structure (28 sections, 29 equations, 11 figures, 12 tables, 3 algorithms)

This paper contains 28 sections, 29 equations, 11 figures, 12 tables, 3 algorithms.

Figures (11)

  • Figure 1: Overview of multimodal crystal flow with any-to-any modality generation. MCFlow trains a flow model with two independent time variables corresponding to atom types ($t$) and structures ($s$). By selecting task-specific inference trajectories in the $(t,s)$ space, a single model performs crystal structure prediction, atom type generation, and de novo generation.
  • Figure 2: Composition- and symmetry-aware atom ordering with hierarchical permutation augmentation. Atoms in the primitive unit cell are lexicographically ordered by Pauling electronegativity and Wyckoff position (denoted by letter $a,b,c,\ldots$) to expose compositional and crystallographic structure. The ordering and augmentation are illustrated on a $\mathrm{Th}_5\mathrm{C}$ crystal in the $\mathrm{R}\bar{3}\mathrm{m}$ space group. Inter-orbit permutation reorders orbits sharing the same elements and Wyckoff position, while intra-orbit permutation permutes atoms within each orbit. This ordering and hierarchical permutation augmentation provides compositional and crystallographic symmetry information to sequence-based Transformers without explicitly conditioning space group or Wyckoff positions.
  • Figure 3: Effect of the number of integration steps on performance. Crystal structure prediction (CSP) match rate and de novo generation (DNG) validity rate (both structural and compositional) evaluated at different integration steps.
  • Figure 4: Distributions of crystallographic symmetry properties. Distributions of generated structures on MP-20. From left to right, panels show the distributions of space groups (P1, Fm$\bar{3}$m, C2/m, P6$_3$/mmc, I4/mmm, Pnma, R$\bar{3}$m, Cmcm, Pm$\bar{3}$m, P4/mmm, P$\bar{1}$, $\ldots$), Wyckoff multiplicity (1, 2, 3, 4, 6), and Wyckoff dimensionality (0, 1, 2, 3). Space groups and Wyckoff positions are determined using the SpaceGroupAnalyzer module in pymatgen with distance of 0.1 Å.
  • Figure 5: Distributions of relaxed structures. Distributions of convergence steps, RMSE between initial and relaxed structures, and $E_{\text{hull}}$ along CHGNet geometry optimization trajectories, compared to ADiT and FlowMM. The convergence rates are 99.73/60.79/87.65/99.54% for MCFlow/FlowMM/ADiT/ADiT Joint.
  • ...and 6 more figures