Table of Contents
Fetching ...

Flow Matching for Accelerated Simulation of Atomic Transport in Crystalline Materials

Juno Nam, Sulin Liu, Gavin Winter, KyuJung Jun, Soojung Yang, Rafael Gómez-Bombarelli

TL;DR

LiFlow addresses the computational bottleneck of atomistic diffusion simulations in crystalline materials by reframing MD propagation as conditional generation of atomic displacements, specifically modeling $p(m{D}_{oldsymbol{ riangle au}} ig| oldsymbol{X}_{ au}, oldsymbol{L}, oldsymbol{a}, T)$. It employs a flow-matching framework with a Propagator to advance coordinates and a Corrector to rectify unphysical geometries, guided by a Maxwell–Boltzmann adaptive prior that accounts for temperature and composition within an equivariant PaiNN backbone. On a universal MLIP dataset of $4{,}186$ lithium-containing structures across four temperatures, LiFlow achieves robust replication of diffusion observables (Spearman MSD $0.7$–$0.8$) and scales to large supercells with speedups up to $6 imes10^5$ compared to AIMD, enabling scalable screening of solid-state electrolytes. The approach demonstrates transfer across compositions and the ability to extend short AIMD trajectories while preserving key kinetic and structural statistics, though extrapolation beyond training temperatures and uncertainty quantification remain important areas for future work.

Abstract

Atomic transport underpins the performance of materials in technologies such as energy storage and electronics, yet its simulation remains computationally demanding. In particular, modeling ionic diffusion in solid-state electrolytes (SSEs) requires methods that can overcome the scale limitations of traditional ab initio molecular dynamics (AIMD). We introduce LiFlow, a generative framework to accelerate MD simulations for crystalline materials that formulates the task as conditional generation of atomic displacements. The model uses flow matching, with a Propagator submodel to generate atomic displacements and a Corrector to locally correct unphysical geometries, and incorporates an adaptive prior based on the Maxwell-Boltzmann distribution to account for chemical and thermal conditions. We benchmark LiFlow on a dataset comprising 25-ps trajectories of lithium diffusion across 4,186 SSE candidates at four temperatures. The model obtains a consistent Spearman rank correlation of 0.7-0.8 for lithium mean squared displacement (MSD) predictions on unseen compositions. Furthermore, LiFlow generalizes from short training trajectories to larger supercells and longer simulations while maintaining high accuracy. With speed-ups of up to 600,000$\times$ compared to first-principles methods, LiFlow enables scalable simulations at significantly larger length and time scales.

Flow Matching for Accelerated Simulation of Atomic Transport in Crystalline Materials

TL;DR

LiFlow addresses the computational bottleneck of atomistic diffusion simulations in crystalline materials by reframing MD propagation as conditional generation of atomic displacements, specifically modeling . It employs a flow-matching framework with a Propagator to advance coordinates and a Corrector to rectify unphysical geometries, guided by a Maxwell–Boltzmann adaptive prior that accounts for temperature and composition within an equivariant PaiNN backbone. On a universal MLIP dataset of lithium-containing structures across four temperatures, LiFlow achieves robust replication of diffusion observables (Spearman MSD ) and scales to large supercells with speedups up to compared to AIMD, enabling scalable screening of solid-state electrolytes. The approach demonstrates transfer across compositions and the ability to extend short AIMD trajectories while preserving key kinetic and structural statistics, though extrapolation beyond training temperatures and uncertainty quantification remain important areas for future work.

Abstract

Atomic transport underpins the performance of materials in technologies such as energy storage and electronics, yet its simulation remains computationally demanding. In particular, modeling ionic diffusion in solid-state electrolytes (SSEs) requires methods that can overcome the scale limitations of traditional ab initio molecular dynamics (AIMD). We introduce LiFlow, a generative framework to accelerate MD simulations for crystalline materials that formulates the task as conditional generation of atomic displacements. The model uses flow matching, with a Propagator submodel to generate atomic displacements and a Corrector to locally correct unphysical geometries, and incorporates an adaptive prior based on the Maxwell-Boltzmann distribution to account for chemical and thermal conditions. We benchmark LiFlow on a dataset comprising 25-ps trajectories of lithium diffusion across 4,186 SSE candidates at four temperatures. The model obtains a consistent Spearman rank correlation of 0.7-0.8 for lithium mean squared displacement (MSD) predictions on unseen compositions. Furthermore, LiFlow generalizes from short training trajectories to larger supercells and longer simulations while maintaining high accuracy. With speed-ups of up to 600,000 compared to first-principles methods, LiFlow enables scalable simulations at significantly larger length and time scales.
Paper Structure (22 sections, 2 theorems, 20 equations, 19 figures, 6 tables)

This paper contains 22 sections, 2 theorems, 20 equations, 19 figures, 6 tables.

Key Result

Proposition 1

Given an invariant base distribution $p_0(\bm{D}_0)$ satisfying eq:sym_permeq:sym_rot and an equivariant conditional vector field $u_t(\bm{D}_t \vert \bm{D}_1)$ with the following properties: the generated conditional probability path $p_{t \vert 1}(\bm{D}_t \vert \bm{D}_1)$ is invariant. Furthermore, given that the data distribution $q(\bm{D}_1)$ is invariant, the marginal probability path $p_t(

Figures (19)

  • Figure 1: LiFlow scheme. LiFlow is a generative acceleration framework for MD simulations for crystalline materials, with Propagator and Corrector components leveraging a conditional flow matching scheme for accurate generation of atomic displacements during time propagation. Transferability across chemical composition, temperatures, and supercell sizes is considered in designing the task, adaptive prior distribution, and flow model architectures.
  • Figure 1: Dataset statistics.(a) Elemental count distribution across the unit cells of the structures in the dataset. (b) Histogram of lithium MSD values from 25-ps MD simulations at different temperatures. (c) Distribution of atom counts (in the constructed supercell) per structure. (d) Distribution of element counts per structure. (e) Space group distribution of the structures.
  • Figure 2: Parity plots for kinetic metrics and trajectory visualizations.(a, b) Parity plots comparing the log MSD values for (a) lithium and (b) frame atoms in 800 K, 25 ps simulations across test set materials. Data points are colored by their respective prior scales. (c) Reference and generated trajectories for argyrodite Li6PS5Br (highlighted points in a and b).
  • Figure 2: Universal model inference example. (Top) Parity plots comparing the log MSD values for lithium and frame atoms in 800 K simulations (reference vs. 25-step LiFlow inference) across 419 test materials. Data points are colored by their respective prior scales, with four annotated examples (I--IV) highlighted below. II and III represent failed cases where lithium MSD is overestimated and underestimated, respectively. Dotted lines indicate the classification boundary between large and small priors. (Bottom) Reference and generated trajectories for the four annotated test set materials.
  • Figure 3: Reproducing diffusivity from AIMD models.(a) Lithium self-diffusivity ($D^\ast$ for Li3PS4 (LPS) polymorphs, derived from AIMD (25 ps training, $\sim$250 ps full trajectories jun2024nonexistence) and 250 ps LiFlow inference. (b) Lithium $D^\ast$ as a function of $1000/T$ for Li10GeP2S12 (LGPS), using AIMD (25 ps training, $\sim$150 ps full trajectories lopez2024how) and 150 ps LiFlow inference on a $2 \times 2 \times 1$ supercell. Scatter points and error bars represent the median and 95% confidence intervals (CIs) for $D^\ast$, from 1,000 MCMC samples of the Bayesian regression mccluskey2024kinisi. The shaded region represents the CIs for the Arrhenius fit ($1/T$ vs. $\log D^\ast(T)$) from 25 ps AIMD data. (c) Results for a $4 \times 4 \times 4$ supercell from fine-tuned MLIP simulations winter2023simulations and 1 ns LiFlow inference.
  • ...and 14 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 1