Table of Contents
Fetching ...

FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble

Riccardo Tedoldi, Ola Engkvist, Patrick Bryant, Hossein Azizpour, Jon Paul Janet, Alessandro Tibo

TL;DR

FlexiFlow extends flow-matching to decompose the generation of molecular graphs and multiple conformers into independently tractable flows, enabling joint sampling of structures and conformational ensembles while preserving permutation invariance and SE(3) equivariance. The model combines invariant and equivariant coordinate handling with a two-coordinate architecture and a loss that couples coordinates and chemistry, achieving state-of-the-art performance on QM9 and GEOM Drugs and producing diverse, low-energy conformers at a fraction of physics-based costs. It also demonstrates transfer to protein-conditioned ligand generation, suggesting practical utility for structure-based drug design. Overall, FlexiFlow offers efficient, diverse, and chemically valid molecular ensembles suitable for downstream discovery, with potential to integrate protein dynamics in future work.

Abstract

Sampling useful three-dimensional molecular structures along with their most favorable conformations is a key challenge in drug discovery. Current state-of-the-art 3D de-novo design flow matching or diffusion-based models are limited to generating a single conformation. However, the conformational landscape of a molecule determines its observable properties and how tightly it is able to bind to a given protein target. By generating a representative set of low-energy conformers, we can more directly assess these properties and potentially improve the ability to generate molecules with desired thermodynamic observables. Towards this aim, we propose FlexiFlow, a novel architecture that extends flow-matching models, allowing for the joint sampling of molecules along with multiple conformations while preserving both equivariance and permutation invariance. We demonstrate the effectiveness of our approach on the QM9 and GEOM Drugs datasets, achieving state-of-the-art results in molecular generation tasks. Our results show that FlexiFlow can generate valid, unstrained, unique, and novel molecules with high fidelity to the training data distribution, while also capturing the conformational diversity of molecules. Moreover, we show that our model can generate conformational ensembles that provide similar coverage to state-of-the-art physics-based methods at a fraction of the inference time. Finally, FlexiFlow can be successfully transferred to the protein-conditioned ligand generation task, even when the dataset contains only static pockets without accompanying conformations.

FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble

TL;DR

FlexiFlow extends flow-matching to decompose the generation of molecular graphs and multiple conformers into independently tractable flows, enabling joint sampling of structures and conformational ensembles while preserving permutation invariance and SE(3) equivariance. The model combines invariant and equivariant coordinate handling with a two-coordinate architecture and a loss that couples coordinates and chemistry, achieving state-of-the-art performance on QM9 and GEOM Drugs and producing diverse, low-energy conformers at a fraction of physics-based costs. It also demonstrates transfer to protein-conditioned ligand generation, suggesting practical utility for structure-based drug design. Overall, FlexiFlow offers efficient, diverse, and chemically valid molecular ensembles suitable for downstream discovery, with potential to integrate protein dynamics in future work.

Abstract

Sampling useful three-dimensional molecular structures along with their most favorable conformations is a key challenge in drug discovery. Current state-of-the-art 3D de-novo design flow matching or diffusion-based models are limited to generating a single conformation. However, the conformational landscape of a molecule determines its observable properties and how tightly it is able to bind to a given protein target. By generating a representative set of low-energy conformers, we can more directly assess these properties and potentially improve the ability to generate molecules with desired thermodynamic observables. Towards this aim, we propose FlexiFlow, a novel architecture that extends flow-matching models, allowing for the joint sampling of molecules along with multiple conformations while preserving both equivariance and permutation invariance. We demonstrate the effectiveness of our approach on the QM9 and GEOM Drugs datasets, achieving state-of-the-art results in molecular generation tasks. Our results show that FlexiFlow can generate valid, unstrained, unique, and novel molecules with high fidelity to the training data distribution, while also capturing the conformational diversity of molecules. Moreover, we show that our model can generate conformational ensembles that provide similar coverage to state-of-the-art physics-based methods at a fraction of the inference time. Finally, FlexiFlow can be successfully transferred to the protein-conditioned ligand generation task, even when the dataset contains only static pockets without accompanying conformations.

Paper Structure

This paper contains 46 sections, 1 theorem, 68 equations, 27 figures, 4 tables, 2 algorithms.

Key Result

Theorem 4.1

The FlexiFlow model is equivariant with respect to the coordinates $x$ and $y$. Let $\tilde{x}$ and $\tilde{y}$ be the normalized coordinate sets as defined in Equation eq:equivariance, and let $R_x \in SO(3)$ and $R_y \in SO(3)$ be rotation matrices. The only exchange of information between $x$ and The same argument applies to $\tilde{y}$. Since SemlaFlow DBLP:conf/aistats/IrwinTJO25 is equivaria

Figures (27)

  • Figure 1: From noise samples, our model generates both molecular graphs and their conformational ensembles.
  • Figure 2: The FlexiFlow architecture takes equivariant and invariant features as input at time $t \in [0,1]$ and produces predictions at $t=1$. The left macro dashed block is the featurization layer. Blocks in the same column with the same color share weights. Solid blocks represent invariant features, while dashed blocks represent equivariant features. Message computation and feature refinement layer blocks produces both invariant and equivariant features.
  • Figure 3: Figure show RMSD previous and after energy minimization.
  • Figure 4: Figures show Cov (left) and AMR (right) precision–recall for GEOM Drugs comparing CREST-generated conformer with RDKit, FlexiFlow and Adjoint Sampling (AS); see Appendix \ref{['appendix:conformer_metrics']} for metric details.
  • Figure 5: Figure shows in the top right the graph of the generated molecule in 2D for illustration purposes. The rest of the grid shows the reference graph with its conformer ($x$), followed by a set of generated conformers ($y$) for the same generated molecular graph. The $y$ conformers are aligned to the $x$ conformer for better visualization and we report energy and the RMSD value between each generated conformer ($y$) and the reference one ($x$).
  • ...and 22 more figures

Theorems & Definitions (1)

  • Theorem 4.1: Equivariance