Table of Contents
Fetching ...

Gradient GA: Gradient Genetic Algorithm for Drug Molecular Design

Chris Zhuang, Debadyuti Mukherjee, Yingzhou Lu, Tianfan Fu, Ruqi Zhang

TL;DR

Gradient GA addresses slow convergence in genetic algorithms for drug molecular design by injecting gradient information into the search process. It builds a differentiable surrogate objective via a graph neural network and uses the Discrete Langevin Proposal to guide discrete molecular sampling toward optima, enabling gradient-based exploration in graphs. Empirical results show faster convergence and higher top-$K$ scores (up to ~25% improvement over vanilla GA) with competitive synthetic accessibility and manageable computational cost. This approach offers a principled, gradient-guided alternative to purely random GA exploration and holds promise for multi-objective extensions and more efficient chemical space navigation.

Abstract

Molecular discovery has brought great benefits to the chemical industry. Various molecule design techniques are developed to identify molecules with desirable properties. Traditional optimization methods, such as genetic algorithms, continue to achieve state-of-the-art results across multiple molecular design benchmarks. However, these techniques rely solely on random walk exploration, which hinders both the quality of the final solution and the convergence speed. To address this limitation, we propose a novel approach called Gradient Genetic Algorithm (Gradient GA), which incorporates gradient information from the objective function into genetic algorithms. Instead of random exploration, each proposed sample iteratively progresses toward an optimal solution by following the gradient direction. We achieve this by designing a differentiable objective function parameterized by a neural network and utilizing the Discrete Langevin Proposal to enable gradient guidance in discrete molecular spaces. Experimental results demonstrate that our method significantly improves both convergence speed and solution quality, outperforming cutting-edge techniques. For example, it achieves up to a 25% improvement in the top-10 score over the vanilla genetic algorithm. The code is publicly available at https://github.com/debadyuti23/GradientGA.

Gradient GA: Gradient Genetic Algorithm for Drug Molecular Design

TL;DR

Gradient GA addresses slow convergence in genetic algorithms for drug molecular design by injecting gradient information into the search process. It builds a differentiable surrogate objective via a graph neural network and uses the Discrete Langevin Proposal to guide discrete molecular sampling toward optima, enabling gradient-based exploration in graphs. Empirical results show faster convergence and higher top- scores (up to ~25% improvement over vanilla GA) with competitive synthetic accessibility and manageable computational cost. This approach offers a principled, gradient-guided alternative to purely random GA exploration and holds promise for multi-objective extensions and more efficient chemical space navigation.

Abstract

Molecular discovery has brought great benefits to the chemical industry. Various molecule design techniques are developed to identify molecules with desirable properties. Traditional optimization methods, such as genetic algorithms, continue to achieve state-of-the-art results across multiple molecular design benchmarks. However, these techniques rely solely on random walk exploration, which hinders both the quality of the final solution and the convergence speed. To address this limitation, we propose a novel approach called Gradient Genetic Algorithm (Gradient GA), which incorporates gradient information from the objective function into genetic algorithms. Instead of random exploration, each proposed sample iteratively progresses toward an optimal solution by following the gradient direction. We achieve this by designing a differentiable objective function parameterized by a neural network and utilizing the Discrete Langevin Proposal to enable gradient guidance in discrete molecular spaces. Experimental results demonstrate that our method significantly improves both convergence speed and solution quality, outperforming cutting-edge techniques. For example, it achieves up to a 25% improvement in the top-10 score over the vanilla genetic algorithm. The code is publicly available at https://github.com/debadyuti23/GradientGA.

Paper Structure

This paper contains 32 sections, 10 equations, 12 figures, 20 tables, 1 algorithm.

Figures (12)

  • Figure 1: Gradient GA pipeline.
  • Figure 2: Overview of DLP-based sampling procedure in Gradient GA, illustrating how the sampled molecule moves toward the optimum.
  • Figure 3: Comparison of Mestranol similarity AUC Top 10 scores as the number of oracle calls increases.
  • Figure 4: Comparison of Mestranol similarity AUC Top 100 scores as the number of oracle calls increases.
  • Figure 5: Heatmap of synthetic accessibility (SA) score for all methods and oracles (lower is better).
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 4.1: Oracle