Table of Contents
Fetching ...

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang

TL;DR

GraphAF introduces a flow-based autoregressive framework for molecular graph generation, enabling exact likelihood computation and efficient, parallel training. By integrating dequantization, BFS-based masking, and valency-aware sampling, GraphAF generates 100% chemically valid molecules when chemistry rules are applied, and enhances property optimization via reinforcement learning to achieve state-of-the-art results. The approach demonstrates strong density modeling performance on standard datasets and generalizes to diverse graph domains, offering a scalable path for drug discovery and material design. Overall, GraphAF combines the strengths of autoregressive generation and normalizing flows to deliver both validity and versatility in molecular graph generation and optimization.

Abstract

Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

TL;DR

GraphAF introduces a flow-based autoregressive framework for molecular graph generation, enabling exact likelihood computation and efficient, parallel training. By integrating dequantization, BFS-based masking, and valency-aware sampling, GraphAF generates 100% chemically valid molecules when chemistry rules are applied, and enhances property optimization via reinforcement learning to achieve state-of-the-art results. The approach demonstrates strong density modeling performance on standard datasets and generalizes to diverse graph domains, offering a scalable path for drug discovery and material design. Overall, GraphAF combines the strengths of autoregressive generation and normalizing flows to deliver both validity and versatility in molecular graph generation and optimization.

Abstract

Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

Paper Structure

This paper contains 23 sections, 12 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the proposed GraphAF model. (a) Illustration of the generative procedure. New nodes or edges are marked in red. Starting from an empty graph and iteratively sample random variables to map them to atom/bond features. The numbered first three steps correspond to the maps in the bottom figure of Fig. \ref{['subfig::auto_flow']}. (b) Computation graph of GraphAF. The left side are the nodes and edges and the right are latent variables.
  • Figure 2: Molecules generated in property optimization and constrained property optimization tasks. (a) Molecules with high penalized logP scores. (b) Molecules with high QED scores. (c) Two pairs of molecules in constrained property optimization for penalized logP with similarity 0.71(top) and 0.64(bottom).
  • Figure 3: Visualizations of training graphs and generated graphs of EGO-SMALL.
  • Figure 4: Visualizations of training graphs and generated graphs of COMMUNITY-SMALL.
  • Figure 5: 50 molecules sampled from prior.
  • ...and 2 more figures