Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport
Haokai Hong, Wanyu Lin, Kay Chen Tan
TL;DR
This work tackles fast and accurate 3D molecule generation by framing it as geometric optimal transport (OT) in a joint flow-matching setting. GOAT embeds multi-modal molecular data (coordinates and atom features) into a latent, equivariant space and separately solves Optimal Molecule Transport (OMT) and Optimal Distribution Transport (ODT) to produce a straight, low-cost transport path; a purification step ensures high-quality molecules and a theoretical result guarantees non-increasing geometric transport cost. Empirically, GOAT achieves substantial speedups (e.g., ~2x faster sampling and up to ~89.65% reduction in transport cost) and state-of-the-art or competitive generation quality on QM9 and GEOM-DRUG, with notable gains in novelty and significance over baselines. The approach blends a latent equivariant autoencoder, Hungarian-based permutation alignment, Kabsch rotation, and a purification-enhanced coupling to enable scalable, controllable 3D molecule generation with solid theoretical support and practical impact for de novo design. The work includes code release and demonstrates clear pathways for extending fast, geometry-aware generative modeling in chemistry and materials science.
Abstract
This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a joint, equivariant, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivariant networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a mechanism for estimating and purifying optimal coupling to train the flow model with optimal transport. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to an estimated optimal transport plan for fast 3D molecule generation. The purification filters out the subpar molecules to ensure the ultimate generation quality. We theoretically and empirically prove that the proposed optimal coupling estimation and purification yield transport plan with non-increasing cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. The code is available at https://github.com/WanyuGroup/ICLR2025-GOAT.
