Repurformer: Transformers for Repurposing-Aware Molecule Generation
Changhun Lee, Gyumin Lee
TL;DR
Repurformer tackles the sample bias in target-specific de novo molecule generation by exploiting multi-hop protein–compound relationships through bi-directional pretraining and FFT-based latent-space processing with LPF. The method learns latent 2-hop relations via encoders trained in protein→compound and compound→protein directions, then decodes via a compound decoder guided by a low-frequency-enriched representation, emphasizing longer-range interactions. Empirical results on BindingDB show Repurformer can generate valid, diverse substitutes that resemble positive compounds, outperforming several baselines on several metrics while revealing a trade-off between validity and diversity (mode collapse) that depends on the frequency cutoff. The work advances repurposing-aware generation and suggests future integration with diffusion or graph-based models and reinforcement learning to enhance diversity and real-world applicability for drug repurposing.
Abstract
Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample bias problem} remains. This problem occurs when generated molecules targeting the same protein tend to be structurally similar, reducing the diversity of generation. To address this, we propose leveraging multi-hop relationships among proteins and compounds. Our model, Repurformer, integrates bi-directional pretraining with Fast Fourier Transform (FFT) and low-pass filtering (LPF) to capture complex interactions and generate diverse molecules. A series of experiments on BindingDB dataset confirm that Repurformer successfully creates substitutes for anchor compounds that resemble positive compounds, increasing diversity between the anchor and generated compounds.
