Table of Contents
Fetching ...

SOLD: SELFIES-based Objective-driven Latent Diffusion

Elbert Ho

TL;DR

This work proposes SOLD (SELFIES-based Objective-driven Latent Diffusion), a novel latent diffusion model that generates molecules in a latent space derived from 1D SELFIES strings and conditioned on a target protein.

Abstract

Recently, machine learning has made a significant impact on de novo drug design. However, current approaches to creating novel molecules conditioned on a target protein typically rely on generating molecules directly in the 3D conformational space, which are often slow and overly complex. In this work, we propose SOLD (SELFIES-based Objective-driven Latent Diffusion), a novel latent diffusion model that generates molecules in a latent space derived from 1D SELFIES strings and conditioned on a target protein. In the process, we also train an innovative SELFIES transformer and propose a new way to balance losses when training multi-task machine learning models.Our model generates high-affinity molecules for the target protein in a simple and efficient way, while also leaving room for future improvements through the addition of more data.

SOLD: SELFIES-based Objective-driven Latent Diffusion

TL;DR

This work proposes SOLD (SELFIES-based Objective-driven Latent Diffusion), a novel latent diffusion model that generates molecules in a latent space derived from 1D SELFIES strings and conditioned on a target protein.

Abstract

Recently, machine learning has made a significant impact on de novo drug design. However, current approaches to creating novel molecules conditioned on a target protein typically rely on generating molecules directly in the 3D conformational space, which are often slow and overly complex. In this work, we propose SOLD (SELFIES-based Objective-driven Latent Diffusion), a novel latent diffusion model that generates molecules in a latent space derived from 1D SELFIES strings and conditioned on a target protein. In the process, we also train an innovative SELFIES transformer and propose a new way to balance losses when training multi-task machine learning models.Our model generates high-affinity molecules for the target protein in a simple and efficient way, while also leaving room for future improvements through the addition of more data.

Paper Structure

This paper contains 12 sections, 4 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of SOLD. Molecules are first encoded as SELFIES strings and then transformed into the latent space through a transformer. A diffusion model is then trained in the latent space to generate novel molecules. Proteins are encoded as ESM-2 embeddings on the right.
  • Figure 2: Some results of the evolutionary algorithm.
  • Figure 3: Renderings of molecule generated by SOLD
  • Figure 4: Render of another molecule generated by SOLD
  • Figure 5: Renderings of Paxlovid
  • ...and 2 more figures