Table of Contents
Fetching ...

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

Xiandong Zou, Xiangyu Zhao, Pietro Liò, Yiren Zhao

TL;DR

This paper examines whether using more expressive graph neural networks (GNNs) improves molecular graph generation. By replacing the inner GNNs of three graph-generative frameworks (GCPN, GraphAF, GraphEBM) with six advanced GNNs (e.g., GearNet, GIN, GAT, GATv2, PNA, GSN) and evaluating on six molecular objectives on ZINC-250k, it shows that expressive GNNs can boost performance, but expressiveness alone is not sufficient. Edge-feature extraction and richer generation objectives emerge as key factors, with GearNet-based variants achieving substantial gains and approaching or surpassing state-of-the-art non-GNN baselines on metrics like DRD2, Median1, and Median2. The results imply a shift in focus from sheer expressiveness to mechanisms that effectively utilize edge information and meaningful objectives for de-novo molecular design.

Abstract

Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks -- autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM -- on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

TL;DR

This paper examines whether using more expressive graph neural networks (GNNs) improves molecular graph generation. By replacing the inner GNNs of three graph-generative frameworks (GCPN, GraphAF, GraphEBM) with six advanced GNNs (e.g., GearNet, GIN, GAT, GATv2, PNA, GSN) and evaluating on six molecular objectives on ZINC-250k, it shows that expressive GNNs can boost performance, but expressiveness alone is not sufficient. Edge-feature extraction and richer generation objectives emerge as key factors, with GearNet-based variants achieving substantial gains and approaching or surpassing state-of-the-art non-GNN baselines on metrics like DRD2, Median1, and Median2. The results imply a shift in focus from sheer expressiveness to mechanisms that effectively utilize edge information and meaningful objectives for de-novo molecular design.

Abstract

Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks -- autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM -- on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.
Paper Structure (49 sections, 21 equations, 5 figures, 11 tables)

This paper contains 49 sections, 21 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: An overview of the GCPN model: this is an example of iterative graph generation from an intermediate graph $G_{t}$ to an intermediate graph $G_{t+1}$. Part 1 is the illustration of the graph representation learning process based on a GNN. Part 2 is the illustration of the graph generative procedure based on a reinforcement learning (RL) agent. New nodes or edges are marked in red.
  • Figure 2: An overview of the GraphAF model, demonstrating an example of iterative graph generation from an intermediate graph $G_{t}$ to an intermediate graph $G_{t+1}$. Part 1 is the illustration of the graph representation learning process based on a GNN. Part 2 is the illustration of the graph generative procedure based on a flow-based model. New nodes or edges are marked in red.
  • Figure 3: An overview of the GraphEBM model, demonstrating an example of the one-shot graph generation from an initialised node feature matrix $\textbf{N}$ and adjacency matrix $\textbf{A}$ to a generated molecular graph $G^{*}$ with validity correction. It is the illustration of the graph generative procedure based on the Langevin dynamics with a trained energy function $E_{\theta^{*}}(\textbf{N}, \textbf{A})$ (parameterised by a GNN).
  • Figure 4: Molecules with highest generation metrics: Penalised logP and QED generated by GNN-based graph generative models on de-novo molecule design tasks.
  • Figure 5: Molecules with highest generation metrics: DRD2, Median1, Median2 and QED generated by proposed GNN-based graph generative models: (a) GCPN with R-GCN (b) GraphAF with R-GCN (c) GCPN with GearNet (d) GraphAF with GearNet on de-novo molecule design tasks.

Theorems & Definitions (4)

  • Definition 1: Graph Representation Learning
  • Definition 2: GNN Expressiveness
  • Definition 3: GNN Edge Feature Extraction Ability
  • Definition 4: GNN One-hot Encoding Edge Feature Extraction Ability