Table of Contents
Fetching ...

Synergistic Benefits of Joint Molecule Generation and Property Prediction

Adam Izdebski, Jan Olszewski, Pankhil Gawade, Krzysztof Koras, Serra Korkmaz, Valentin Rauscher, Jakub M. Tomczak, Ewa Szczurek

TL;DR

Hyformer presents a unified transformer architecture that jointly models data generation and property prediction for molecules by combining an autoregressive decoder with a bidirectional encoder in a single shared backbone. Through alternating attention and a joint pre-training scheme, it achieves synergistic benefits in conditional sampling, OOD property prediction, and representation learning, while maintaining competitive generative and predictive performance. The method is validated on conditional molecule generation, out-of-distribution prediction, molecular representation learning, unconditional generation, and antimicrobial peptide design, demonstrating real-world applicability in drug discovery. These results highlight the potential of flexible, jointly trained transformers to streamline end-to-end molecular design pipelines where generation quality and predictive robustness are both crucial.

Abstract

Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. We show that Hyformer is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning. Finally, we demonstrate the benefits of joint learning in a drug design use case of discovering novel antimicrobial~peptides.

Synergistic Benefits of Joint Molecule Generation and Property Prediction

TL;DR

Hyformer presents a unified transformer architecture that jointly models data generation and property prediction for molecules by combining an autoregressive decoder with a bidirectional encoder in a single shared backbone. Through alternating attention and a joint pre-training scheme, it achieves synergistic benefits in conditional sampling, OOD property prediction, and representation learning, while maintaining competitive generative and predictive performance. The method is validated on conditional molecule generation, out-of-distribution prediction, molecular representation learning, unconditional generation, and antimicrobial peptide design, demonstrating real-world applicability in drug discovery. These results highlight the potential of flexible, jointly trained transformers to streamline end-to-end molecular design pipelines where generation quality and predictive robustness are both crucial.

Abstract

Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. We show that Hyformer is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning. Finally, we demonstrate the benefits of joint learning in a drug design use case of discovering novel antimicrobial~peptides.

Paper Structure

This paper contains 53 sections, 4 theorems, 20 equations, 9 figures, 11 tables, 1 algorithm.

Key Result

Lemma 4.1

Let $p(\mathop{\mathrm{\mathbf{x}}}\limits, y)$ be a joint probability distribution over $\mathop{\mathrm{\mathcal{X}}}\nolimits \times \mathop{\mathrm{\mathcal{Y}}}\nolimits$. If $y_{c} \in \mathop{\mathrm{\mathcal{Y}}}\nolimits$ is a property value such that $p(y_{c}) > 0$, then

Figures (9)

  • Figure 1: A schematic representation of $\mathop{\mathrm{\textcolor{hyformer-inline}{Hyformer}}}\limits$. Depending on the task token $\mathop{\mathrm{\small \texttt{[TASK]}}}\limits$, $\mathop{\mathrm{\textcolor{hyformer-inline}{Hyformer}}}\limits$ uses either a causal or a bidirectional mask, outputting token probabilities or predicted property values.
  • Figure 2: (a) Amino-acid distributions between the pre-trained and unconditionally generated sequences. (b) Distributions of charge, aromaticity, and isoelectric point (pI) for: non-AMP, AMP and conditionally generated sequences. (c) Frequency of crossing an attention threshold (x-axis) vs. mean attention weight (y-axis) for distinct amino-acids, colored by charge and sized by hydrophobicity.
  • Figure 3: Structures of the twelve generated molecules with Hyformer when the sampling temperature is 0.9, visualized using RDKit, together with their properties.
  • Figure 4: Structures of the twelve generated molecules with Hyformer when the sampling temperature is 1.0, visualized using RDKit, together with their properties.
  • Figure 5: Structures of the twelve generated molecules with Hyformer when the sampling temperature is 1.1, visualized using RDKit, together with their properties.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Lemma 4.1
  • proof
  • Lemma C.1
  • proof
  • Corollary C.2
  • proof
  • Lemma C.1
  • proof