Table of Contents
Fetching ...

Repurformer: Transformers for Repurposing-Aware Molecule Generation

Changhun Lee, Gyumin Lee

TL;DR

Repurformer tackles the sample bias in target-specific de novo molecule generation by exploiting multi-hop protein–compound relationships through bi-directional pretraining and FFT-based latent-space processing with LPF. The method learns latent 2-hop relations via encoders trained in protein→compound and compound→protein directions, then decodes via a compound decoder guided by a low-frequency-enriched representation, emphasizing longer-range interactions. Empirical results on BindingDB show Repurformer can generate valid, diverse substitutes that resemble positive compounds, outperforming several baselines on several metrics while revealing a trade-off between validity and diversity (mode collapse) that depends on the frequency cutoff. The work advances repurposing-aware generation and suggests future integration with diffusion or graph-based models and reinforcement learning to enhance diversity and real-world applicability for drug repurposing.

Abstract

Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample bias problem} remains. This problem occurs when generated molecules targeting the same protein tend to be structurally similar, reducing the diversity of generation. To address this, we propose leveraging multi-hop relationships among proteins and compounds. Our model, Repurformer, integrates bi-directional pretraining with Fast Fourier Transform (FFT) and low-pass filtering (LPF) to capture complex interactions and generate diverse molecules. A series of experiments on BindingDB dataset confirm that Repurformer successfully creates substitutes for anchor compounds that resemble positive compounds, increasing diversity between the anchor and generated compounds.

Repurformer: Transformers for Repurposing-Aware Molecule Generation

TL;DR

Repurformer tackles the sample bias in target-specific de novo molecule generation by exploiting multi-hop protein–compound relationships through bi-directional pretraining and FFT-based latent-space processing with LPF. The method learns latent 2-hop relations via encoders trained in protein→compound and compound→protein directions, then decodes via a compound decoder guided by a low-frequency-enriched representation, emphasizing longer-range interactions. Empirical results on BindingDB show Repurformer can generate valid, diverse substitutes that resemble positive compounds, outperforming several baselines on several metrics while revealing a trade-off between validity and diversity (mode collapse) that depends on the frequency cutoff. The work advances repurposing-aware generation and suggests future integration with diffusion or graph-based models and reinforcement learning to enhance diversity and real-world applicability for drug repurposing.

Abstract

Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample bias problem} remains. This problem occurs when generated molecules targeting the same protein tend to be structurally similar, reducing the diversity of generation. To address this, we propose leveraging multi-hop relationships among proteins and compounds. Our model, Repurformer, integrates bi-directional pretraining with Fast Fourier Transform (FFT) and low-pass filtering (LPF) to capture complex interactions and generate diverse molecules. A series of experiments on BindingDB dataset confirm that Repurformer successfully creates substitutes for anchor compounds that resemble positive compounds, increasing diversity between the anchor and generated compounds.
Paper Structure (24 sections, 6 equations, 8 figures, 4 tables)

This paper contains 24 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) illustrates a many-to-many relationship between proteins and compounds. The bold lines indicate potential repurposing flows by which, given an anchor compound's target protein $p$ (P-45984), a positive compound $c^{+}$ (C-5280445) can be considered to replace the anchor compound $\acute{c}$ (C-16046126). Red boxes in (b) and (c) represent the parts of $p$ (P-45984) to which $\acute{c}$ (C-16046126) and $c^{+}$ (C-5280445) attend, respectively. It is noteworthy that attending regions are right next to each other, implying $c^{+}$ may have a potential repurposability to $p$.
  • Figure 2: Overview of Repurformer
  • Figure 3: Comparison of 2D Molecule Drawings. From left to right, the drawings represent the anchor $\acute{c}$, positive $c^{+}$, and generated compounds $\hat{c}^{+}$, respectively. $\hat{c}^{+}$ is expected to interact with the target protein to which $\acute{c}$ interacts.
  • Figure 4: (a) illustrates the distance distribution from the molecular fingerprint perspective. (b) describes the estimated two-dimensional Gaussian distribution of anchor, positive, and generated compounds.
  • Figure 5: Validity-Uniqueness Trade-off at different values of $\alpha$. Note that validity represents the quality of generated samples, while uniqueness represents the diversity of generated samples.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 3.1: Protein-Compound Graph
  • Definition 3.2: Protein-Compound Pair
  • Definition 3.3: Anchor/Positive Compounds