Kernel-Elastic Autoencoder for Molecular Design

Haote Li; Yu Shee; Brandon Allen; Federica Maschietto; Victor Batista

Kernel-Elastic Autoencoder for Molecular Design

Haote Li, Yu Shee, Brandon Allen, Federica Maschietto, Victor Batista

TL;DR

The kernel-elastic autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design, is introduced, setting a new state-of-the-art benchmark in constrained optimizations.

Abstract

We introduce the Kernel-Elastic Autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design. KAE is formulated based on two novel loss functions: modified maximum mean discrepancy and weighted reconstruction. KAE addresses the long-standing challenge of achieving valid generation and accurate reconstruction at the same time. KAE achieves remarkable diversity in molecule generation while maintaining near-perfect reconstructions on the independent testing dataset, surpassing previous molecule-generating models. KAE enables conditional generation and allows for decoding based on beam search resulting in state-of-the-art performance in constrained optimizations. Furthermore, KAE can generate molecules conditional to favorable binding affinities in docking applications as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. Beyond molecular design, we anticipate KAE could be applied to solve problems by generation in a wide range of applications.

Kernel-Elastic Autoencoder for Molecular Design

TL;DR

Abstract

Paper Structure (26 sections, 14 equations, 13 figures, 6 tables)

This paper contains 26 sections, 14 equations, 13 figures, 6 tables.

Introduction
Result
KAE Performance
Learning Behavior
Conditional KAE
CKAE for Ligand Docking
Comparison to GFlowNet
Glide Analysis
Discussion
Methods
Model Architecture
KAE Loss
KAE m-MMD Loss
Decoding Methods
Docking Methods
...and 11 more sections

Figures (13)

Figure 1: Conditional KAE transformer architecture. KAE consists of 6 encoder layers, 6 decoder layers, and a latent space for conditional generations. During training, the condition is concatenated after positional embedding and provided as input to the 4-head attention encoder. The condition is also concatenated with the latent vector before a mixing layer. During training, Gaussian noise is added to the latent vectors. The decoder output is then passed through a linear layer and softmax function, producing the probabilities of output tokens for each character in the dictionary of size $T$.
Figure 2: Comparison of learning rates for models trained with m-MMD loss, s-MMD loss, and KL divergence loss. (a) Validity evaluated at each epoch. (b) Fraction of molecules properly reconstructed as a function of epochs. (c) Novelty evaluated at each epoch. (d) The uniqueness at each epoch. The model labeled as KL includes an extra layer that estimates the standard deviation of each latent vector. The models labeled with m-MMD are trained with the loss $\mathcal{L}_{CEL} + \textit{m-MMD}(\lambda=1)$, s-MMD with $\mathcal{L}_{CEL} + \textit{s-MMD}(\lambda=1)$, and KL with $\mathcal{L}_{VAE}=\mathcal{L}_{CEL} + \textit{KL}(\lambda=1)$. "No noise" means no noise is added to the latent vectors during training.
Figure 3: CKAE correlation performance. The blue dots represent the mean PLogP values of 1,000 molecules generated by CKAE, as a function of the condition PLogP value. The error bars on each dot indicates the associated standard deviation as estimations of errors. The black line shows the ground truth values strongly correlated with the mean PLogP values. The histogram shows the underlying distribution of the training dataset over the entire range of PLogP values.
Figure 4: Glide analysis of molecular inhibitors docked at the active site of sEH.(a) Binding interactions of top scoring molecules generated by CKAE (left), searched from the training dataset (middle), and generated by GFlowNet (right). (b) Extra Precision (XP) Glide score Boltzmann factors for the top ten candidates obtained from the CKAE and training dataset (TD) show that the top ranking CKAE-generated outperform the top molecules from the TD ensemble. (c) Histogram of Glide XP docking scores, showing that top scoring inhibitors generated by CKAE or TD outperform 869 tautomers generated from the top ten candidates of the two datasets.
Figure S1: Comparison of learning rates for models trained with different $\lambda$ values for KL divergence loss. (a) Validity evaluated at each epoch. (b) Fraction of molecules properly reconstructed as a function of epochs. (c) Novelty evaluated at each epoch. (d) The uniqueness at each epoch. The models are trained with the loss $\mathcal{L}_{VAE}=\mathcal{L}_{CEL} + \textit{KL}(\lambda)$. The m-MMD model with the loss $\mathcal{L}_{CEL} + \textit{m-MMD}(\lambda=1)$.
...and 8 more figures

Kernel-Elastic Autoencoder for Molecular Design

TL;DR

Abstract

Kernel-Elastic Autoencoder for Molecular Design

Authors

TL;DR

Abstract

Table of Contents

Figures (13)