Table of Contents
Fetching ...

Navigating Chemical Space with Latent Flows

Guanghao Wei, Yining Huang, Chenru Duan, Yue Song, Yuanqi Du

TL;DR

A new framework, ChemFlow, is proposed to traverse chemical space through navigating the latent space learned by molecule generative models through flows, and introduces a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity.

Abstract

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.

Navigating Chemical Space with Latent Flows

TL;DR

A new framework, ChemFlow, is proposed to traverse chemical space through navigating the latent space learned by molecule generative models through flows, and introduces a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity.

Abstract

Recent progress of deep generative models in the vision and language domain has stimulated significant interest in more structured data generation such as molecules. However, beyond generating new random molecules, efficient exploration and a comprehensive understanding of the vast chemical space are of great importance to molecular science and applications in drug design and materials discovery. In this paper, we propose a new framework, ChemFlow, to traverse chemical space through navigating the latent space learned by molecule generative models through flows. We introduce a dynamical system perspective that formulates the problem as learning a vector field that transports the mass of the molecular distribution to the region with desired molecular properties or structure diversity. Under this framework, we unify previous approaches on molecule latent space traversal and optimization and propose alternative competing methods incorporating different physical priors. We validate the efficacy of ChemFlow on molecule manipulation and single- and multi-objective molecule optimization tasks under both supervised and unsupervised molecular discovery settings. Codes and demos are publicly available on GitHub at https://github.com/garywei944/ChemFlow.
Paper Structure (52 sections, 1 theorem, 40 equations, 16 figures, 10 tables, 3 algorithms)

This paper contains 52 sections, 1 theorem, 40 equations, 16 figures, 10 tables, 3 algorithms.

Key Result

Proposition 3.1

(Global Convergence of Langevin Dynamics, adapted from gelfand1991recursive). Given a Langevin dynamics in the form of where $\mathbf{w}_t$ is a $d$-dimensional Brownian motion, $a_t$ and $b_t$ are a set of positive numbers with $a_T, b_T \rightarrow 0$, and ${\bm{u}}_t$ is a set of random variables in $\mathbb{R}^n$ denoting noisy measurements of the energy function $h_\eta(\cdot)$. Under mild a

Figures (16)

  • Figure 1: ChemFlow framework: (1) a pre-trained encoder $f_\theta(\cdot)$ and decoder $g_\psi(\cdot)$ that maps between molecules ${\bm{x}}$ and latent vectors ${\bm{z}}$, (2) we use a property predictor $h_\eta(\cdot)$ (green box) or a "Jacobian control" (yellow box) as the guidance to learn a vector field $\nabla_z\phi^k(t, {\bm{z}}_t)$ that maximizes the change in certain molecular properties (e.g. plogP, QED) or molecular structures, (3) during the training process, we add additional dynamical regularization on the flow. The learned flows move the latent samples to change the structures and properties of the molecules smoothly. (Better seen in color). The flow chart illustrates a case where a molecule is manipulated into a drug like caffeine.
  • Figure 2: Visualization of generated ligands docked against target ESR1 and ACAA1.
  • Figure 3: Molecular property plogP distribution shifts following the latent flow path.
  • Figure 4: Distribution shift for plogP optimization
  • Figure 5: Optimization Convergence Langevin Dynamics shows faster convergence and achieves greater improvement in plogP.
  • ...and 11 more figures

Theorems & Definitions (1)

  • Proposition 3.1