Flow-Matching Based Refiner for Molecular Conformer Generation
Xiangyang Xu, Hongyang Gao
TL;DR
This work tackles low-energy molecular conformer generation by augmenting denoising-based flow-matching with a refinement stage that starts from upstream conformers and uses a rescheduled noise scale to bypass the ill-trained high-noise regime. The proposed Flow-Matching-Based Refiner defines a data-centered base distribution $\mathbf x_0 = \mathbf x_1 + \sigma\boldsymbol{\varepsilon}$ with $\sigma=1$ and trains a time-conditioned velocity field to progressively refine conformers, enabling self-calibration via $t^*=1-\sigma^*/\sigma$ and stable sampling with $\frac{d}{dt}\mathbf x_t = \mathbf u_\theta(\mathbf x_t,t,\mathcal{G})$. Empirically on GEOM-DRUGS and GEOM-QM9, the generator–refiner pipeline achieves higher sample quality (lower $\text{AMR}$) and preserved diversity (higher $\text{COV}$) with fewer total steps, and improves chemical-property realism as measured by xTB metrics. The approach offers practical gains in sampling efficiency and robustness for drug-discovery workflows by reducing error accumulation during early denoising and maintaining ensemble diversity.
Abstract
Low-energy molecular conformers generation (MCG) is a foundational yet challenging problem in drug discovery. Denoising-based methods include diffusion and flow-matching methods that learn mappings from a simple base distribution to the molecular conformer distribution. However, these approaches often suffer from error accumulation during sampling, especially in the low SNR steps, which are hard to train. To address these challenges, we propose a flow-matching refiner for the MCG task. The proposed method initializes sampling from mixed-quality outputs produced by upstream denoising models and reschedules the noise scale to bypass the low-SNR phase, thereby improving sample quality. On the GEOM-QM9 and GEOM-Drugs benchmark datasets, the generator-refiner pipeline improves quality with fewer total denoising steps while preserving diversity.
