Table of Contents
Fetching ...

EquiFlow: Equivariant Conditional Flow Matching with Optimal Transport for 3D Molecular Conformation Prediction

Qingwen Tian, Yuxin Xu, Yixuan Yang, Zhen Wang, Ziqi Liu, Pengju Yan, Xiaolin Li

TL;DR

EquiFlow introduces an SE(3)-equivariant OT-guided conditional flow matching framework for 3D molecular conformation prediction. It combines a modified Equiformer to encode rich atomic and bond features with an OT-CFM training objective and an ODE-based sampler to achieve fast inference and improved accuracy. The approach yields state-of-the-art results on QM9 and GEOM-QM9, demonstrating strong performance in both single-conformation accuracy (RMSD ≈ $0.17$ Å) and multi-conformation diversity/coverage metrics. This work provides a scalable, symmetry-preserving alternative to diffusion-based methods, with potential impact on drug design and materials science by enabling efficient end-to-end molecular conformation prediction with high fidelity.

Abstract

Molecular 3D conformations play a key role in determining how molecules interact with other molecules or protein surfaces. Recent deep learning advancements have improved conformation prediction, but slow training speeds and difficulties in utilizing high-degree features limit performance. We propose EquiFlow, an equivariant conditional flow matching model with optimal transport. EquiFlow uniquely applies conditional flow matching in molecular 3D conformation prediction, leveraging simulation-free training to address slow training speeds. It uses a modified Equiformer model to encode Cartesian molecular conformations along with their atomic and bond properties into higher-degree embeddings. Additionally, EquiFlow employs an ODE solver, providing faster inference speeds compared to diffusion models with SDEs. Experiments on the QM9 dataset show that EquiFlow predicts small molecule conformations more accurately than current state-of-the-art models.

EquiFlow: Equivariant Conditional Flow Matching with Optimal Transport for 3D Molecular Conformation Prediction

TL;DR

EquiFlow introduces an SE(3)-equivariant OT-guided conditional flow matching framework for 3D molecular conformation prediction. It combines a modified Equiformer to encode rich atomic and bond features with an OT-CFM training objective and an ODE-based sampler to achieve fast inference and improved accuracy. The approach yields state-of-the-art results on QM9 and GEOM-QM9, demonstrating strong performance in both single-conformation accuracy (RMSD ≈ Å) and multi-conformation diversity/coverage metrics. This work provides a scalable, symmetry-preserving alternative to diffusion-based methods, with potential impact on drug design and materials science by enabling efficient end-to-end molecular conformation prediction with high fidelity.

Abstract

Molecular 3D conformations play a key role in determining how molecules interact with other molecules or protein surfaces. Recent deep learning advancements have improved conformation prediction, but slow training speeds and difficulties in utilizing high-degree features limit performance. We propose EquiFlow, an equivariant conditional flow matching model with optimal transport. EquiFlow uniquely applies conditional flow matching in molecular 3D conformation prediction, leveraging simulation-free training to address slow training speeds. It uses a modified Equiformer model to encode Cartesian molecular conformations along with their atomic and bond properties into higher-degree embeddings. Additionally, EquiFlow employs an ODE solver, providing faster inference speeds compared to diffusion models with SDEs. Experiments on the QM9 dataset show that EquiFlow predicts small molecule conformations more accurately than current state-of-the-art models.

Paper Structure

This paper contains 31 sections, 20 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: The architecture of ConfromFlow. The architecture involves using fixed atom and bond types as conditions, followed by Flow Matching on the atomic coordinates to produce $x_t$, $t$, and $u_t$. After embedding the relative position features and time features along with the conditions, a modified Equiformer is employed to predict the vector field $v_t$ around the atoms. Using Mean Squared Error (MSE) loss, the predicted vector field $v_T$ is then fitted to the ground truth vector field $u_T$ obtained from Flow Matching. For training procedure, see Algorithm \ref{['alg:train']}, for sampling procedure, see Algorithm \ref{['alg:sample']}.
  • Figure 2: The process of Equivariant OT-CFM. We perform OT between different conformations of the same molecule to obtain the mapping that minimizes the transport cost between the Gaussian noise coordinates $x_0$ and the true conformation coordinates $x_1$. Following this, we calculate the conditional probability path and the corresponding conditional vector field during the CFM process. Note that both $x_0$ and $x_1$ need to be centered using the Zero Center-of-Mass (Zero CoM) operation to ensure translational equivariance, and the Kabsch algorithm kabsch1976solution is used for rotational alignment.
  • Figure 3: Atom, Bond, and Time Embedding Blocks in Modified Equiformer. We embed input 3D molecular graph with Atom Type, Bond Type, Relative Pos, and Time embeddings before transformer blocks, consisting of SO(2) equivariant graph attention and feed forward networks.
  • Figure 4: Samples from the single-conformation prediction on QM9 dataset. We selected 10 molecules, with SMILES from left to right as follows: C1CC2(CO2)C12COC2, CC(C)(C)COCC#N, CC(O)C(C)(C#N)C=O, CC1CC(=O)NC1N, CC12CC1OC(=O)C2N, CCC12CC3C(C1O)N32, O=C1OC=C(F)OC1=O, O=C1OC2C3COC3C12, O=CC1C2NC2C12CC2, OC1C2CCC3OC3C12.
  • Figure 5: Samples from the multi-conformation prediction on GEOM-QM9 dataset. We selected 5 conformations from 2 molecules for display. The molecule on the left, C#CCC[C@@H]1CNC1=O, consists of 9 conformations in total, while the molecule on the right, C#CCCC@(O)CC, has 104 conformations.