Table of Contents
Fetching ...

OrgFlow: Generative Modeling of Organic Crystal Structures from Molecular Graphs

Mohammadmahdi Vahediahmar, Matthew A. McDonald, Feng Liu

TL;DR

A flow-matching model for predicting organic crystal structures directly from molecular graphs that integrates molecular connectivity with periodic boundary conditions while preserving the symmetries of crystalline systems is introduced.

Abstract

Crystal structure prediction is a long-standing challenge in materials science, with most data-driven methods developed for inorganic systems. This leaves an important gap for organic crystals, which are central to pharmaceuticals, polymers, and functional materials, but present unique challenges, such as larger unit cells and strict chemical connectivity. We introduce a flow-matching model for predicting organic crystal structures directly from molecular graphs. The architecture integrates molecular connectivity with periodic boundary conditions while preserving the symmetries of crystalline systems. A bond-aware loss guides the model toward realistic local chemistry by enforcing distributions of bond lengths and connectivity. To support reliable and efficient training, we built a curated dataset of organic crystals, along with a preprocessing pipeline that precomputes bonds and edges, substantially reducing computational overhead during both training and inference. Experiments show that our method achieves a Match Rate more than 10 times higher than existing baselines while requiring fewer sampling steps for inference. These results establish generative modeling as a practical and scalable framework for organic crystal structure prediction.

OrgFlow: Generative Modeling of Organic Crystal Structures from Molecular Graphs

TL;DR

A flow-matching model for predicting organic crystal structures directly from molecular graphs that integrates molecular connectivity with periodic boundary conditions while preserving the symmetries of crystalline systems is introduced.

Abstract

Crystal structure prediction is a long-standing challenge in materials science, with most data-driven methods developed for inorganic systems. This leaves an important gap for organic crystals, which are central to pharmaceuticals, polymers, and functional materials, but present unique challenges, such as larger unit cells and strict chemical connectivity. We introduce a flow-matching model for predicting organic crystal structures directly from molecular graphs. The architecture integrates molecular connectivity with periodic boundary conditions while preserving the symmetries of crystalline systems. A bond-aware loss guides the model toward realistic local chemistry by enforcing distributions of bond lengths and connectivity. To support reliable and efficient training, we built a curated dataset of organic crystals, along with a preprocessing pipeline that precomputes bonds and edges, substantially reducing computational overhead during both training and inference. Experiments show that our method achieves a Match Rate more than 10 times higher than existing baselines while requiring fewer sampling steps for inference. These results establish generative modeling as a practical and scalable framework for organic crystal structure prediction.
Paper Structure (14 sections, 7 equations, 4 figures, 5 tables)

This paper contains 14 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of OrgFlow. OrgFlow learns to generate organic crystal structures from molecular graphs. It preserves covalent bonds through a bond-aware loss, overcoming limitations of inorganic-focused models. The result is accurate, periodic crystals with more than 10× higher match rates and 25× fewer sampling steps.
  • Figure 2: Overview of OrgFlow.\ref{['fig:architecture_overview']} shows the architecture combining molecular embeddings, periodic edges, and symmetry-preserving layers. \ref{['fig:flow_matching_overview']} illustrates the conditional flow-matching process used to generate full crystal structures.
  • Figure 3: Examples of predicted crystal structures (green) compared to ground truth (gray). The figures show two-dimensional projections of three-dimensional structures. Only heavy atoms are shown for clarity.
  • Figure 4: Inference efficiency on Small Molecules. OrgFlow saturates at about $20$ steps, while FlowMM needs around $500$ to approach its best Match Rate.