PackFlow: Generative Molecular Crystal Structure Prediction via Reinforcement Learning Alignment

Akshay Subramanian; Elton Pan; Juno Nam; Maurice Weiler; Shuhui Qu; Cheol Woo Park; Tommi S. Jaakkola; Elsa Olivetti; Rafael Gomez-Bombarelli

PackFlow: Generative Molecular Crystal Structure Prediction via Reinforcement Learning Alignment

Akshay Subramanian, Elton Pan, Juno Nam, Maurice Weiler, Shuhui Qu, Cheol Woo Park, Tommi S. Jaakkola, Elsa Olivetti, Rafael Gomez-Bombarelli

TL;DR

PackFlow is introduced, a flow matching framework for molecular crystal structure prediction (CSP) that generates heavy-atom crystal proposals by jointly sampling Cartesian coordinates and unit-cell lattice parameters given a molecular graph, and physics alignment is proposed, a reinforcement learning post-training stage that uses machine-learned interatomic potential energies and forces as stability proxies.

Abstract

Organic molecular crystals underpin technologies ranging from pharmaceuticals to organic electronics, yet predicting solid-state packing of molecules remains challenging because candidate generation is combinatorial and stability is only resolved after costly energy evaluations. Here we introduce PackFlow, a flow matching framework for molecular crystal structure prediction (CSP) that generates heavy-atom crystal proposals by jointly sampling Cartesian coordinates and unit-cell lattice parameters given a molecular graph. This lattice-aware generation interfaces directly with downstream relaxation and lattice-energy ranking, positioning PackFlow as a scalable proposal engine within standard CSP pipelines. To explicitly steer generation toward physically favourable regions, we propose physics alignment, a reinforcement learning post-training stage that uses machine-learned interatomic potential energies and forces as stability proxies. Physics alignment improves physical validity without altering inference-time sampling. We validate PackFlow's performance against heuristic baselines through two distinct evaluations. First, on a broad unseen set of molecular systems, we demonstrate superior candidate generation capability, with proposals exhibiting greater structural similarity to experimental polymorphs. Second, we assess the full end-to-end workflow on two unseen CSP blind-test case studies, including relaxation and lattice-energy analysis. In both settings, PackFlow outperforms heuristics-based methods by concentrating probability mass in low-energy basins, yielding candidates that relax into lower-energy minima and offering a practical route to amortize the relax-and-rank bottleneck.

PackFlow: Generative Molecular Crystal Structure Prediction via Reinforcement Learning Alignment

TL;DR

Abstract

Paper Structure (64 sections, 34 equations, 8 figures, 6 tables, 3 algorithms)

This paper contains 64 sections, 34 equations, 8 figures, 6 tables, 3 algorithms.

Introduction
Results
PackFlow-enabled crystal structure prediction workflow
A bond-aware flow-matching transformer
Problem setup and flow-matching objective
Evaluation of heavy-atom proposal quality
Independent flow times for coordinates and lattice
Covalent-bond attention bias improves physical validity
Unwrapped data representation
Matching experimental statistics and symmetric attention
Physics alignment via reinforcement learning
Alignment signal from heavy-atom energies and forces
Group relative policy optimization for flow models
Advantage mixing instead of reward mixing
Alignment improves physical validity and proximity to ground truths
...and 49 more sections

Figures (8)

Figure 1: Schematic overview of PackFlow-enhanced molecular crystal structure prediction workflow.(a) Pipeline consists of several steps: heavy atom crystal generation with PackFlow, Hydrogen addition and H-only relaxation with MLIP, and full crystal relaxation and lattice energy ranking with MLIP to obtain final metastable polymorphs. (b)PackFlow-Base models generate candidates with lower lattice energies than heuristic structure generation methods. Physics alignment drives lattice energies further down in comparison to base model initialization. (c, d)PackFlow training is divided into pre-training and post-training stages. Pre-training (c) involves simultaneous training of coordinate and lattice with standard flow-matching objective. Post-training (d) involves training with GRPO objective on heavy-atom energies and forces (approximate) obtained from MLIP. We abbreviate "vector-field" as VF. (e) A sample flow trajectory on the test set demonstrating joint sampling of lattice and coordinates.
Figure 2: Architectural components of PackFlow, and crystal preprocessing.(a) Coordinates, atom types, lattice parameters, and flow times are jointly embedded into per-atom tokens, which are fed into transformer encoder. Coordinate and lattice flow times are sampled independently. Covalent bonding information is embedded as an additive attention bias to the transformer attention scores. Coordinate and lattice vector fields (VFs) are obtained as readouts from updated atom tokens. (b) For all crystals in the dataset, molecules are made whole at unit cell boundaries (unwrapped) and centered by mean subtraction. See Section \ref{['sec:unwrapping']} for details.
Figure 3: Generation quality of PackFlow-Base models, and analysis of learned attention scores.(a) Bond lengths, (b) bond angles, (c) lattice lengths, (d) lattice angles distributions of generated crystals on test data in comparison to experimental ground truth distributions. (e) Atoms learn to attend to nearby neighbors and (possibly) distant atoms in different molecules that are symmetric replicas. "Query" atoms are colored red, and "key" atoms are colored green. (f) Average pairwise attention score profile is uniformly spread out at high flow times, and peaks at small distances at low flow times, indicating transition from global to local attention with flow progression.
Figure 4: Physics alignment (PA) post-training approach. (a)PackFlow-PA is trained by generating a set of observations $O_i$ from the current policy, evaluating heavy-atom energies $E_h$ and forces $F_h$ (approximations to all-atom energies and forces) using MLIP, mixing energy and force advantages using advantage mixing (details in (c)), and using the advantages as feedback to update the policy. KL divergence regularization term is computed between current policy and reference policy (PackFlow-Base) to ensure that post-training does not steer far away from base model distribution. (b) post-training results in significant lowering of $E_h$ and $F_h$, and reduction in atomic clashes on test data. (c) Advantage mixing linearly interpolates advantages instead of rewards for multi-objective post-training. Advantages being normalized quantities, do not require manual re-scaling which is typically required if rewards of varying scales are mixed directly. (d) Variation of mixing parameter $\lambda$ allows smooth tradeoff in test performance between $E_h$ and $F_h$. Curves of test performance as a function of PA step are shown on left, and an example structure colored by per-atom energies and forces as a function of $\lambda$ on the right. Red-/blue- colored atoms indicate higher/lower $E_h$/$F_h$, respectively.
Figure 5: Comprehensive comparison of PackFlow and Genarris on two CSP blind test examples.(a)PackFlow models generate initial proposals that are closer in density to experimental structure, than Genarris baselines. Genarris Plain typically under-compresses, while Genarris Rigid Press over-compresses structures relative to experiment. Structures with median density among predicted samples are visualized. (b) Both Genarris methods tend to produce proposals with lower energies on average than PackFlow models prior to relaxation, but with under-compression and over-compression in densities with respect to experiment. (c)PackFlow structures reach lower lying minima than Genarris structures after MLIP relaxation. (d) After relaxation, PackFlow models result in polymorphs that are closer to experiment in density/lattice-energy space, than Genarris polymorphs. Purple colors in zoomed and violin subplots indicate minimum distance in relative-lattice-energy-density plane to experimental polymorph; the "best" predictions across methods are used for comparison here, rather than averages. All relative energies across figures are of hydrogenated crystals, and are calculated with respect to the fully relaxed structure with the global minimum energy. Points in all figures were after application of a bond-length filter to remove implausible molecular geometries. More details in Section \ref{['sec:fig5_vis_filters']}.
...and 3 more figures

PackFlow: Generative Molecular Crystal Structure Prediction via Reinforcement Learning Alignment

TL;DR

Abstract

PackFlow: Generative Molecular Crystal Structure Prediction via Reinforcement Learning Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (8)