Table of Contents
Fetching ...

Flow-Matching Based Refiner for Molecular Conformer Generation

Xiangyang Xu, Hongyang Gao

TL;DR

This work tackles low-energy molecular conformer generation by augmenting denoising-based flow-matching with a refinement stage that starts from upstream conformers and uses a rescheduled noise scale to bypass the ill-trained high-noise regime. The proposed Flow-Matching-Based Refiner defines a data-centered base distribution $\mathbf x_0 = \mathbf x_1 + \sigma\boldsymbol{\varepsilon}$ with $\sigma=1$ and trains a time-conditioned velocity field to progressively refine conformers, enabling self-calibration via $t^*=1-\sigma^*/\sigma$ and stable sampling with $\frac{d}{dt}\mathbf x_t = \mathbf u_\theta(\mathbf x_t,t,\mathcal{G})$. Empirically on GEOM-DRUGS and GEOM-QM9, the generator–refiner pipeline achieves higher sample quality (lower $\text{AMR}$) and preserved diversity (higher $\text{COV}$) with fewer total steps, and improves chemical-property realism as measured by xTB metrics. The approach offers practical gains in sampling efficiency and robustness for drug-discovery workflows by reducing error accumulation during early denoising and maintaining ensemble diversity.

Abstract

Low-energy molecular conformers generation (MCG) is a foundational yet challenging problem in drug discovery. Denoising-based methods include diffusion and flow-matching methods that learn mappings from a simple base distribution to the molecular conformer distribution. However, these approaches often suffer from error accumulation during sampling, especially in the low SNR steps, which are hard to train. To address these challenges, we propose a flow-matching refiner for the MCG task. The proposed method initializes sampling from mixed-quality outputs produced by upstream denoising models and reschedules the noise scale to bypass the low-SNR phase, thereby improving sample quality. On the GEOM-QM9 and GEOM-Drugs benchmark datasets, the generator-refiner pipeline improves quality with fewer total denoising steps while preserving diversity.

Flow-Matching Based Refiner for Molecular Conformer Generation

TL;DR

This work tackles low-energy molecular conformer generation by augmenting denoising-based flow-matching with a refinement stage that starts from upstream conformers and uses a rescheduled noise scale to bypass the ill-trained high-noise regime. The proposed Flow-Matching-Based Refiner defines a data-centered base distribution with and trains a time-conditioned velocity field to progressively refine conformers, enabling self-calibration via and stable sampling with . Empirically on GEOM-DRUGS and GEOM-QM9, the generator–refiner pipeline achieves higher sample quality (lower ) and preserved diversity (higher ) with fewer total steps, and improves chemical-property realism as measured by xTB metrics. The approach offers practical gains in sampling efficiency and robustness for drug-discovery workflows by reducing error accumulation during early denoising and maintaining ensemble diversity.

Abstract

Low-energy molecular conformers generation (MCG) is a foundational yet challenging problem in drug discovery. Denoising-based methods include diffusion and flow-matching methods that learn mappings from a simple base distribution to the molecular conformer distribution. However, these approaches often suffer from error accumulation during sampling, especially in the low SNR steps, which are hard to train. To address these challenges, we propose a flow-matching refiner for the MCG task. The proposed method initializes sampling from mixed-quality outputs produced by upstream denoising models and reschedules the noise scale to bypass the low-SNR phase, thereby improving sample quality. On the GEOM-QM9 and GEOM-Drugs benchmark datasets, the generator-refiner pipeline improves quality with fewer total denoising steps while preserving diversity.

Paper Structure

This paper contains 21 sections, 18 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Comparison of neighbor degree distributions during training: (a) with a maximum radius of 2.5; (b) with a maximum radius of 5.0.
  • Figure 2: Velocity fields on GEOM–QM9: (a) ET–Flow sampling; (b) Refiner; (c) refiner with randomized $t$.
  • Figure 3: GEOM-QM9 AMR–precision dynamics during refinement