Table of Contents
Fetching ...

ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation

Majdi Hassan, Nikhil Shenoy, Jungyoon Lee, Hannes Stark, Stephan Thaler, Dominique Beaini

TL;DR

This work introduces Equivariant Transformer Flow (ET-Flow), a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions and significantly increases the precision and physical validity of the generated conformers, while being a lighter model and faster at inference.

Abstract

Predicting low-energy molecular conformations given a molecular graph is an important but challenging task in computational drug discovery. Existing state-of-the-art approaches either resort to large scale transformer-based models that diffuse over conformer fields, or use computationally expensive methods to generate initial structures and diffuse over torsion angles. In this work, we introduce Equivariant Transformer Flow (ET-Flow). We showcase that a well-designed flow matching approach with equivariance and harmonic prior alleviates the need for complex internal geometry calculations and large architectures, contrary to the prevailing methods in the field. Our approach results in a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions. With the advantages of equivariance and flow matching, ET-Flow significantly increases the precision and physical validity of the generated conformers, while being a lighter model and faster at inference. Code is available https://github.com/shenoynikhil/ETFlow.

ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation

TL;DR

This work introduces Equivariant Transformer Flow (ET-Flow), a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions and significantly increases the precision and physical validity of the generated conformers, while being a lighter model and faster at inference.

Abstract

Predicting low-energy molecular conformations given a molecular graph is an important but challenging task in computational drug discovery. Existing state-of-the-art approaches either resort to large scale transformer-based models that diffuse over conformer fields, or use computationally expensive methods to generate initial structures and diffuse over torsion angles. In this work, we introduce Equivariant Transformer Flow (ET-Flow). We showcase that a well-designed flow matching approach with equivariance and harmonic prior alleviates the need for complex internal geometry calculations and large architectures, contrary to the prevailing methods in the field. Our approach results in a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions. With the advantages of equivariance and flow matching, ET-Flow significantly increases the precision and physical validity of the generated conformers, while being a lighter model and faster at inference. Code is available https://github.com/shenoynikhil/ETFlow.

Paper Structure

This paper contains 30 sections, 25 equations, 7 figures, 9 tables, 3 algorithms.

Figures (7)

  • Figure 1: (a) Overview of ET-Flow. The model predicts a conditional vector field $\vec{v_\theta}$ using interpolated positions ($x_t$), molecular structure ($G$), and time-step ($t$). Samples are drawn from the harmonic prior ($x_0 \sim p_0$) and then rotationally aligned with the samples from data ($x_1 \sim p_1$). A conditional probability path is constructed between pairs of $x_0$ and $x_1$, and $x_t$ is then sampled from this path at a random time $t$. (b) The ET-Flow architecture consists of a representation module based on the TorchMD-NET architecture tholke2022torchmd and an equivariant vector output module. For detailed architecture and input preprocessing information, see \ref{['app:architecture']}.
  • Figure 2: Stochastic sampling procedure used in inference. Noise is added to the positions $x_t$ indicated by the purple line, resulting in $\hat{x}_t$. Then, the model predicts the vector field $\hat{v}_t$ from $\hat{x}_t$ instead of $x_t$ indicted by the yellow line and updates $\hat{x}_t$ using $\hat{v}_t$ to get $x_{t+1}$.
  • Figure 3: Recall and Precision Coverage result on GEOM-DRUGS as a function of the threshold distance. ET-Flow outperforms TorsionDiff by a large margin especially in a lower threshold region. We emphasize the better performance of ET-Flow at lower thresholds in both Recall and Precision metrics.
  • Figure 4: Sampling efficiency as a measure of the quality of Inference time with respect to the number of time steps on GEOM-DRUGS.
  • Figure 5: (a) Overall Architecture of ET-Flow consisting of 2 components, (1) Representation Layer based on TorchMD-NET tholke2022torchmd and (2) Equivariant Output Layer from schutt2018schnet. (b) Equivariant Attention Layer with all the operations involved, (c) Multi-Head Attention block modified with the LayerNorm.
  • ...and 2 more figures