FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble
Riccardo Tedoldi, Ola Engkvist, Patrick Bryant, Hossein Azizpour, Jon Paul Janet, Alessandro Tibo
TL;DR
FlexiFlow extends flow-matching to decompose the generation of molecular graphs and multiple conformers into independently tractable flows, enabling joint sampling of structures and conformational ensembles while preserving permutation invariance and SE(3) equivariance. The model combines invariant and equivariant coordinate handling with a two-coordinate architecture and a loss that couples coordinates and chemistry, achieving state-of-the-art performance on QM9 and GEOM Drugs and producing diverse, low-energy conformers at a fraction of physics-based costs. It also demonstrates transfer to protein-conditioned ligand generation, suggesting practical utility for structure-based drug design. Overall, FlexiFlow offers efficient, diverse, and chemically valid molecular ensembles suitable for downstream discovery, with potential to integrate protein dynamics in future work.
Abstract
Sampling useful three-dimensional molecular structures along with their most favorable conformations is a key challenge in drug discovery. Current state-of-the-art 3D de-novo design flow matching or diffusion-based models are limited to generating a single conformation. However, the conformational landscape of a molecule determines its observable properties and how tightly it is able to bind to a given protein target. By generating a representative set of low-energy conformers, we can more directly assess these properties and potentially improve the ability to generate molecules with desired thermodynamic observables. Towards this aim, we propose FlexiFlow, a novel architecture that extends flow-matching models, allowing for the joint sampling of molecules along with multiple conformations while preserving both equivariance and permutation invariance. We demonstrate the effectiveness of our approach on the QM9 and GEOM Drugs datasets, achieving state-of-the-art results in molecular generation tasks. Our results show that FlexiFlow can generate valid, unstrained, unique, and novel molecules with high fidelity to the training data distribution, while also capturing the conformational diversity of molecules. Moreover, we show that our model can generate conformational ensembles that provide similar coverage to state-of-the-art physics-based methods at a fraction of the inference time. Finally, FlexiFlow can be successfully transferred to the protein-conditioned ligand generation task, even when the dataset contains only static pockets without accompanying conformations.
