Table of Contents
Fetching ...

CrystalFlow: A Flow-Based Generative Model for Crystalline Materials

Xiaoshan Luo, Zhenyu Wang, Qingchang Wang, Jian Lv, Lei Wang, Yanchao Wang, Yanming Ma

TL;DR

CrystalFlow introduces a flow-based crystal generative model that combines Continuous Normalizing Flows with Conditional Flow Matching and a jointly equivariant graph neural network to generate lattice parameters, fractional coordinates, and atom types under conditioning such as external pressure. By enforcing crystal symmetries through a rotation-invariant lattice representation and periodic translation invariance for fractional coordinates, the model achieves data-efficient learning and high-quality sampling, with an architecture capable of de novo generation as well as property-conditioned design. Across multiple CSP benchmarks (MP-20, MPTS-52, MP-CALYPSO-60) and through DFT validation, CrystalFlow attains state-of-the-art or competitive results in match rate, RMSE, and enthalpy, while enabling targeted generation of structures with desired energies or pressures. The work highlights the practicality of flow-based, symmetry-aware generation for materials discovery and suggests avenues for future improvements, including larger multi-property datasets and hybrid architectures that combine flows with autoregressive components.

Abstract

Deep learning-based generative models have emerged as powerful tools for modeling complex data distributions and generating high-fidelity samples, offering a transformative approach to efficiently explore the configuration space of crystalline materials. In this work, we present CrystalFlow, a flow-based generative model specifically developed for the generation of crystalline materials. CrystalFlow constructs Continuous Normalizing Flows to model lattice parameters, atomic coordinates, and/or atom types, which are trained using Conditional Flow Matching techniques. Through an appropriate choice of data representation and the integration of a graph-based equivariant neural network, the model effectively captures the fundamental symmetries of crystalline materials, which ensures data-efficient learning and enables high-quality sampling. Our experiments demonstrate that CrystalFlow achieves state-of-the-art performance across standard generation benchmarks, and exhibits versatile conditional generation capabilities including producing structures optimized for specific external pressures or desired material properties. These features highlight the model's potential to address realistic crystal structure prediction challenges, offering a robust and efficient framework for advancing data-driven research in condensed matter physics and material science.

CrystalFlow: A Flow-Based Generative Model for Crystalline Materials

TL;DR

CrystalFlow introduces a flow-based crystal generative model that combines Continuous Normalizing Flows with Conditional Flow Matching and a jointly equivariant graph neural network to generate lattice parameters, fractional coordinates, and atom types under conditioning such as external pressure. By enforcing crystal symmetries through a rotation-invariant lattice representation and periodic translation invariance for fractional coordinates, the model achieves data-efficient learning and high-quality sampling, with an architecture capable of de novo generation as well as property-conditioned design. Across multiple CSP benchmarks (MP-20, MPTS-52, MP-CALYPSO-60) and through DFT validation, CrystalFlow attains state-of-the-art or competitive results in match rate, RMSE, and enthalpy, while enabling targeted generation of structures with desired energies or pressures. The work highlights the practicality of flow-based, symmetry-aware generation for materials discovery and suggests avenues for future improvements, including larger multi-property datasets and hybrid architectures that combine flows with autoregressive components.

Abstract

Deep learning-based generative models have emerged as powerful tools for modeling complex data distributions and generating high-fidelity samples, offering a transformative approach to efficiently explore the configuration space of crystalline materials. In this work, we present CrystalFlow, a flow-based generative model specifically developed for the generation of crystalline materials. CrystalFlow constructs Continuous Normalizing Flows to model lattice parameters, atomic coordinates, and/or atom types, which are trained using Conditional Flow Matching techniques. Through an appropriate choice of data representation and the integration of a graph-based equivariant neural network, the model effectively captures the fundamental symmetries of crystalline materials, which ensures data-efficient learning and enables high-quality sampling. Our experiments demonstrate that CrystalFlow achieves state-of-the-art performance across standard generation benchmarks, and exhibits versatile conditional generation capabilities including producing structures optimized for specific external pressures or desired material properties. These features highlight the model's potential to address realistic crystal structure prediction challenges, offering a robust and efficient framework for advancing data-driven research in condensed matter physics and material science.

Paper Structure

This paper contains 24 sections, 20 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Model architecture of CrystalFlow. Random structures, represented by lattice representations $\mathbf{k}_0$, fractional coordinates $\mathbf{F}_0$, and atom types $\mathbf{A}_0$, are sampled from prior distributions. Real structures, characterized by $\mathbf{k}_1$, $\mathbf{F}_1$, and $\mathbf{A}_1$, are sampled from the dataset. Continuous normalizing flows are established between these two sets, defined by vector fields $u_t^k$, $u_t^F$, and $u_t^A$ at time $t$. Intermediate structure components $\mathbf{k}_t$, $\mathbf{F}_t$, and $\mathbf{A}_t$ at a given time $t$, along with conditioning variables, serve as inputs to a graph neural network, which outputs vector fields $v_{t;\theta}^k$, $v_{t;\theta}^F$, and $v_{t;\theta}^A$ . The model is trained by regressing the vector field $v$ to match $u$. For CSP tasks, $\mathbf{A}_0 \equiv \mathbf{A}_1$ is fixed as a conditioning variable.
  • Figure 2: Performance comparison between structures generated by CrystalFlow and the previous Cond-CDVAE model. CrystalFlow is trained on the MP-CALYPSO-60 dataset. Integration steps of $S = 100, 1000,$ and $5000$ are utilized for CrystalFlow, while $S = 5000$ is employed for Cond-CDVAE. a The relationship between the DFT-computed lattice stress and the target pressure for 500 structures generated by each model. The composition and target pressure are randomly sampled from the test set. b Distributions of enthalpy differences for these structures before and after local optimization. c Average energy curves during local optimization for 200 SiO2 structures generated by each model at 0 GPa, with shaded areas denoting standard deviation. d Energy distributions of these SiO2 structures before and after local optimization.
  • Figure 3: Performance of CrystalFlow in DNG tasks with targeted properties. Distributions of formation energy ($E_\text{F}$) for CrystalFlow-generated structures, conditioned on target values of $E_\text{F} = 0, -1, -2, -3,$ and $-4$ eV/atom. The distributions are shown for structures (top) before and (bottom) after geometric optimization. For each target, 10,000 structures are generated. The dotted curve denotes the corresponding distribution of the training set.
  • Figure S3--1: Distribution of equilibrium pressure in the MP-CALYPSO-60 dataset.