MolCrystalFlow: Molecular Crystal Structure Prediction via Flow Matching
Cheng Zeng, Harry W. Sullivan, Thomas Egg, Maya M. Martirossyan, Philipp Höllmer, Jirui Jin, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Ellad B. Tadmor, Mingjie Liu
TL;DR
MolCrystalFlow tackles molecular crystal CSP by learning a joint, periodic, E(3)-invariant flow over lattice $L$, fractional centroids $F$, and orientations $R$ while treating molecules as rigid bodies. It combines an invariant EGNN-based molecular embedding with a Riemannian flow that evolves $(L,F,R)$ on their respective manifolds, using $u_t$ velocity fields and $ ext{SO}(3)$ geodesics to generate full crystal structures. Benchmarking on Thurlemann and OMC25-MCF demonstrates state-of-the-art performance among flow-based methods and competitive results relative to rule-based approaches, with end-to-end integration with u-MLIP and DFT enabling rapid polymorph screening. The work highlights a scalable, geometry-aware path toward data-driven discovery of molecular crystals, while outlining future improvements like energy-aware training, torsional flexibility, and space-group-constrained manifolds for further gains. $$L\in\mathbb{R}^{3\times3},\ F\in[0,1)^{3},\ R\in SO(3)$$ are jointly modeled through learned velocity fields to produce physically plausible packings that respect periodicity and symmetry. $$
Abstract
Molecular crystal structure prediction represents a grand challenge in computational chemistry due to large sizes of constituent molecules and complex intra- and intermolecular interactions. While generative modeling has revolutionized structure discovery for molecules, inorganic solids, and metal-organic frameworks, extending such approaches to fully periodic molecular crystals is still elusive. Here, we present MolCrystalFlow, a flow-based generative model for molecular crystal structure prediction. The framework disentangles intramolecular complexity from intermolecular packing by embedding molecules as rigid bodies and jointly learning the lattice matrix, molecular orientations, and centroid positions. Centroids and orientations are represented on their native Riemannian manifolds, allowing geodesic flow construction and graph neural network operations that respects geometric symmetries. We benchmark our model against state-of-the-art generative models for large-size periodic crystals and rule-based structure generation methods on two open-source molecular crystal datasets. We demonstrate an integration of MolCrystalFlow model with universal machine learning potential to accelerate molecular crystal structure prediction, paving the way for data-driven generative discovery of molecular crystals.
