Table of Contents
Fetching ...

Stiefel Flow Matching for Moment-Constrained Structure Elucidation

Austin Cheng, Alston Lo, Kin Long Kelvin Lee, Santiago Miret, Alán Aspuru-Guzik

TL;DR

Stiefel Flow Matching addresses the problem of structure elucidation from molecular formula and exact moments of inertia by embedding the moment-constrained feasible space into the Stiefel manifold $\mathrm{St}(n,4)$ and learning a Riemannian flow that exactly preserves these constraints. The method leverages reflection and permutation equivariance and introduces an equivariant optimal-transport objective to yield shorter generation paths. Empirically, Stiefel Flow Matching achieves higher success rates and lower sampling costs than Euclidean diffusion baselines on QM9 and GEOM, with OT offering additional gains in trajectory efficiency. The work demonstrates the practical potential of exact manifold constraints for robust 3D molecular structure generation and points to future directions in alternative Riemannian paths, discrete conditioning, and real-world conditioning data.

Abstract

Molecular structure elucidation is a fundamental step in understanding chemical phenomena, with applications in identifying molecules in natural products, lab syntheses, forensic samples, and the interstellar medium. We consider the task of predicting a molecule's all-atom 3D structure given only its molecular formula and moments of inertia, motivated by the ability of rotational spectroscopy to measure these moments. While existing generative models can conditionally sample 3D structures with approximately correct moments, this soft conditioning fails to leverage the many digits of precision afforded by experimental rotational spectroscopy. To address this, we first show that the space of $n$-atom point clouds with a fixed set of moments of inertia is embedded in the Stiefel manifold $\mathrm{St}(n, 4)$. We then propose Stiefel Flow Matching as a generative model for elucidating 3D structure under exact moment constraints. Additionally, we learn simpler and shorter flows by finding approximate solutions for equivariant optimal transport on the Stiefel manifold. Empirically, enforcing exact moment constraints allows Stiefel Flow Matching to achieve higher success rates and faster sampling than Euclidean diffusion models, even on high-dimensional manifolds corresponding to large molecules in the GEOM dataset.

Stiefel Flow Matching for Moment-Constrained Structure Elucidation

TL;DR

Stiefel Flow Matching addresses the problem of structure elucidation from molecular formula and exact moments of inertia by embedding the moment-constrained feasible space into the Stiefel manifold and learning a Riemannian flow that exactly preserves these constraints. The method leverages reflection and permutation equivariance and introduces an equivariant optimal-transport objective to yield shorter generation paths. Empirically, Stiefel Flow Matching achieves higher success rates and lower sampling costs than Euclidean diffusion baselines on QM9 and GEOM, with OT offering additional gains in trajectory efficiency. The work demonstrates the practical potential of exact manifold constraints for robust 3D molecular structure generation and points to future directions in alternative Riemannian paths, discrete conditioning, and real-world conditioning data.

Abstract

Molecular structure elucidation is a fundamental step in understanding chemical phenomena, with applications in identifying molecules in natural products, lab syntheses, forensic samples, and the interstellar medium. We consider the task of predicting a molecule's all-atom 3D structure given only its molecular formula and moments of inertia, motivated by the ability of rotational spectroscopy to measure these moments. While existing generative models can conditionally sample 3D structures with approximately correct moments, this soft conditioning fails to leverage the many digits of precision afforded by experimental rotational spectroscopy. To address this, we first show that the space of -atom point clouds with a fixed set of moments of inertia is embedded in the Stiefel manifold . We then propose Stiefel Flow Matching as a generative model for elucidating 3D structure under exact moment constraints. Additionally, we learn simpler and shorter flows by finding approximate solutions for equivariant optimal transport on the Stiefel manifold. Empirically, enforcing exact moment constraints allows Stiefel Flow Matching to achieve higher success rates and faster sampling than Euclidean diffusion models, even on high-dimensional manifolds corresponding to large molecules in the GEOM dataset.

Paper Structure

This paper contains 32 sections, 5 theorems, 41 equations, 11 figures, 6 tables, 4 algorithms.

Key Result

Theorem 1

As defined above, $\mathcal{M}$ is totally geodesic.

Figures (11)

  • Figure 1: Stiefel Flow Matching learns to elucidate 3D molecular structure from moments and molecular formula alone by transforming uniform Stiefel noise ${\bm{X}}_0$ into valid molecular structures ${\bm{X}}_1$. Generative modelling on the Stiefel manifold $\mathop{\mathrm{St}}\nolimits({n}, {4})$ guarantees that samples always have the correct moments of inertia, which allows the network to focus only on generating chemically stable structures. Within the intersection of these spaces lies the true 3D structure.
  • Figure 2: Histograms of minimum RMSD for predicted QM9 examples show two distinct clusters for RMSD. The 0.25Å threshold captures molecular structures that are useful for structure elucidation.
  • Figure 3: (Left) Learned sampling trajectories for Stiefel FM and Stiefel FM-OT on QM9. Each column begins generation from the same noise. (Right) Histogram of curve lengths of all QM9 sampling trajectories for Stiefel FM and Stiefel FM-OT. Permutation and reflection alignment lead to simpler and shorter paths.
  • Figure 4: Histograms of minimum RMSD for predicted GEOM examples.
  • Figure 5: Selected QM9 examples. Best viewed zoomed in. Examples are sorted by RMSD to ground truth, which is shown in cyan. Green panels indicate meeting the threshold of 0.25 Å.
  • ...and 6 more figures

Theorems & Definitions (10)

  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5
  • proof