Table of Contents
Fetching ...

Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport

Haokai Hong, Wanyu Lin, Kay Chen Tan

TL;DR

This work tackles fast and accurate 3D molecule generation by framing it as geometric optimal transport (OT) in a joint flow-matching setting. GOAT embeds multi-modal molecular data (coordinates and atom features) into a latent, equivariant space and separately solves Optimal Molecule Transport (OMT) and Optimal Distribution Transport (ODT) to produce a straight, low-cost transport path; a purification step ensures high-quality molecules and a theoretical result guarantees non-increasing geometric transport cost. Empirically, GOAT achieves substantial speedups (e.g., ~2x faster sampling and up to ~89.65% reduction in transport cost) and state-of-the-art or competitive generation quality on QM9 and GEOM-DRUG, with notable gains in novelty and significance over baselines. The approach blends a latent equivariant autoencoder, Hungarian-based permutation alignment, Kabsch rotation, and a purification-enhanced coupling to enable scalable, controllable 3D molecule generation with solid theoretical support and practical impact for de novo design. The work includes code release and demonstrates clear pathways for extending fast, geometry-aware generative modeling in chemistry and materials science.

Abstract

This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a joint, equivariant, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivariant networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a mechanism for estimating and purifying optimal coupling to train the flow model with optimal transport. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to an estimated optimal transport plan for fast 3D molecule generation. The purification filters out the subpar molecules to ensure the ultimate generation quality. We theoretically and empirically prove that the proposed optimal coupling estimation and purification yield transport plan with non-increasing cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. The code is available at https://github.com/WanyuGroup/ICLR2025-GOAT.

Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport

TL;DR

This work tackles fast and accurate 3D molecule generation by framing it as geometric optimal transport (OT) in a joint flow-matching setting. GOAT embeds multi-modal molecular data (coordinates and atom features) into a latent, equivariant space and separately solves Optimal Molecule Transport (OMT) and Optimal Distribution Transport (ODT) to produce a straight, low-cost transport path; a purification step ensures high-quality molecules and a theoretical result guarantees non-increasing geometric transport cost. Empirically, GOAT achieves substantial speedups (e.g., ~2x faster sampling and up to ~89.65% reduction in transport cost) and state-of-the-art or competitive generation quality on QM9 and GEOM-DRUG, with notable gains in novelty and significance over baselines. The approach blends a latent equivariant autoencoder, Hungarian-based permutation alignment, Kabsch rotation, and a purification-enhanced coupling to enable scalable, controllable 3D molecule generation with solid theoretical support and practical impact for de novo design. The work includes code release and demonstrates clear pathways for extending fast, geometry-aware generative modeling in chemistry and materials science.

Abstract

This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a joint, equivariant, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivariant networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a mechanism for estimating and purifying optimal coupling to train the flow model with optimal transport. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to an estimated optimal transport plan for fast 3D molecule generation. The purification filters out the subpar molecules to ensure the ultimate generation quality. We theoretically and empirically prove that the proposed optimal coupling estimation and purification yield transport plan with non-increasing cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. The code is available at https://github.com/WanyuGroup/ICLR2025-GOAT.
Paper Structure (22 sections, 1 theorem, 14 equations, 7 figures, 8 tables, 4 algorithms)

This paper contains 22 sections, 1 theorem, 14 equations, 7 figures, 8 tables, 4 algorithms.

Key Result

Theorem 3.1

The coupling $\hat{\Gamma}$ incurs no larger geometric transport cost than the random coupling $\Gamma(p_{0},p_{1})$ in that $\mathbf{E}[\hat{c}_{g}(\mathbf{z}_{0},\mathbf{z}'_{1})]\leq\mathbf{E}[\hat{c}_{g}(\mathbf{z}_{0}, \mathbf{z}_{1})]$, where ($\mathbf{z}_{0},\mathbf{z}'_{1})=\hat{\Gamma}(p_{0

Figures (7)

  • Figure 1: The Illustration of Probability Paths based on Different Molecule Generative Models. 1. The diffusion path edmgeoldm, which typically deviates from a straight line map, necessitates a large number of sampling steps. 2. The hybrid transport equifm ensures straight transport for atomic coordinates, but it does not guarantee the same for atom features. Furthermore, this cost does not consider the optimal distribution couplings, leading to suboptimal transport between distributions. 3. GOAT simultaneously considers the optimal transport for atom coordinates and features, providing a joint and straight path for fast sampling.
  • Figure 2: An Illustration of the Difference Between Straight Coupling and Optimal Coupling. GOAT approximates optimal coupling for a fast generation.
  • Figure 2: Comparisons of generation quality regarding Atom Stability, Validity, Steps, and Time on GEOM-DRUG. The best results are highlighted in bold.
  • Figure 3: MAE for molecular property prediction. A lower number indicates a better controllable generation result. The best results are highlighted in bold.
  • Figure 4: Quality vs. Speed ($\alpha$). GOAT shows the optimal trade-off between generation quality and speed.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof