Table of Contents
Fetching ...

InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization

Yifan Niu, Ziqi Gao, Tingyang Xu, Yang Liu, Yatao Bian, Yu Rong, Junzhou Huang, Jia Li

TL;DR

This work tackles multi-objective molecular optimization by introducing InversionGNN, a dual-path graph neural network that learns chemical-property knowledge via a direct prediction path and exploits that knowledge by gradient-guided edits in a second inversion path. The inversion path operates on differentiable scaffolding trees and uses a gradient-based Pareto search, via a non-dominating descent direction computed through a Quadratic Program, to generate Pareto-optimal molecules in discrete chemical space. The authors provide a convergence analysis showing that their relaxation yields an approximation to the true Pareto front within a bounded number of iterations and demonstrate strong, sample-efficient performance on synthetic benchmarks and real drug-design tasks, including weight-conditioned Pareto front exploration. The approach yields high Pareto diversity and property scores with substantially fewer oracle evaluations, enabling practical guidance for designing molecules that balance potency, safety, and synthesizability in real-world drug discovery.

Abstract

Exploring chemical space to find novel molecules that simultaneously satisfy multiple properties is crucial in drug discovery. However, existing methods often struggle with trading off multiple properties due to the conflicting or correlated nature of chemical properties. To tackle this issue, we introduce InversionGNN framework, an effective yet sample-efficient dual-path graph neural network (GNN) for multi-objective drug discovery. In the direct prediction path of InversionGNN, we train the model for multi-property prediction to acquire knowledge of the optimal combination of functional groups. Then the learned chemical knowledge helps the inversion generation path to generate molecules with required properties. In order to decode the complex knowledge of multiple properties in the inversion path, we propose a gradient-based Pareto search method to balance conflicting properties and generate Pareto optimal molecules. Additionally, InversionGNN is able to search the full Pareto front approximately in discrete chemical space. Comprehensive experimental evaluations show that InversionGNN is both effective and sample-efficient in various discrete multi-objective settings including drug discovery.

InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization

TL;DR

This work tackles multi-objective molecular optimization by introducing InversionGNN, a dual-path graph neural network that learns chemical-property knowledge via a direct prediction path and exploits that knowledge by gradient-guided edits in a second inversion path. The inversion path operates on differentiable scaffolding trees and uses a gradient-based Pareto search, via a non-dominating descent direction computed through a Quadratic Program, to generate Pareto-optimal molecules in discrete chemical space. The authors provide a convergence analysis showing that their relaxation yields an approximation to the true Pareto front within a bounded number of iterations and demonstrate strong, sample-efficient performance on synthetic benchmarks and real drug-design tasks, including weight-conditioned Pareto front exploration. The approach yields high Pareto diversity and property scores with substantially fewer oracle evaluations, enabling practical guidance for designing molecules that balance potency, safety, and synthesizability in real-world drug discovery.

Abstract

Exploring chemical space to find novel molecules that simultaneously satisfy multiple properties is crucial in drug discovery. However, existing methods often struggle with trading off multiple properties due to the conflicting or correlated nature of chemical properties. To tackle this issue, we introduce InversionGNN framework, an effective yet sample-efficient dual-path graph neural network (GNN) for multi-objective drug discovery. In the direct prediction path of InversionGNN, we train the model for multi-property prediction to acquire knowledge of the optimal combination of functional groups. Then the learned chemical knowledge helps the inversion generation path to generate molecules with required properties. In order to decode the complex knowledge of multiple properties in the inversion path, we propose a gradient-based Pareto search method to balance conflicting properties and generate Pareto optimal molecules. Additionally, InversionGNN is able to search the full Pareto front approximately in discrete chemical space. Comprehensive experimental evaluations show that InversionGNN is both effective and sample-efficient in various discrete multi-objective settings including drug discovery.

Paper Structure

This paper contains 38 sections, 4 theorems, 28 equations, 10 figures, 4 tables, 4 algorithms.

Key Result

Theorem 4.2

Under the assumptions in Sec. app:th, given an initial molecule $\boldsymbol{x}^0$ and a weight vector $\boldsymbol{\lambda}$, InversionGNN guarantees the following approximation when performing $T$ optimization rounds: where $\gamma = \frac{1-\alpha^{T}}{(1-\alpha)N}$, $\boldsymbol{\lambda}^{-1}$ = $(1/\lambda_i,\ldots,1/\lambda_m)$, $\check{\lambda}^*$ and $\check{\lambda}^0$ is the maximum re

Figures (10)

  • Figure 1: (1) InversionGNN. A surrogate Oracle GNN is trained to incorporate complicated chemical knowledge. In the direct prediction path, a molecule $\boldsymbol{x}^t$ is fed to the GNN to obtain the objective function at the $t$-th iteration. In the inversion path, we calculate the non-dominating gradient to find local Pareto-optimal molecules $\mathcal{P}^t$ conditioned on given weight vector $\boldsymbol{\lambda}$. (2) Pareto Front Search. Exploring the full Pareto front approximately with various weight vectors, improving the Pareto diversity of generated molecules.
  • Figure 2: Pareto front (black solid curve) for two loss functions $l_1$,$l_2$ and solutions (circles) and Oracle calls (computational cost) for different weights $\alpha = \frac{\lambda_1}{\lambda_2}$ (dashed rays). The weight $\lambda$ conditioned Pareto optimal solution is the intersection points between the Pareto front and $\lambda^{-1}$ rays.
  • Figure 3: Pareto Front Search.
  • Figure 4: The distribution of Top-$100$ JNK3 scores.
  • Figure 5: Optimization process of InversionGNN on JNK3 and GSK3$\beta$ with the weight vector $[1,3]$. (a) As substructures are added, the property scores obtained by InversionGNN increase and become more aligned with the weight vector. (b) Visualization of corresponding molecular graph.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Definition 4.1: Multi-Objective Molecular Optimization (MOMO)
  • Theorem 4.2: Approximation Guarantee
  • Definition B.2: Non-Uniformity
  • Definition B.3: Dominant Set
  • Definition B.4: Uniform Set
  • Definition B.5: Admissible Set
  • Lemma B.6: Bounded Objective Space for the Next Iteration
  • Corollary B.7: Convergence of Admissible Set
  • Theorem B.9: Approximation Guarantee
  • proof