Graph Diffusion Policy Optimization

Yijing Liu; Chao Du; Tianyu Pang; Chongxuan Li; Min Lin; Wei Chen

Graph Diffusion Policy Optimization

Yijing Liu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Wei Chen

TL;DR

Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives, and promising improved performance.

Abstract

Recent research has made significant progress in optimizing diffusion models for downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph presents challenges, resulting in suboptimal performance. This paper introduces graph diffusion policy optimization (GDPO), a novel approach to optimize graph diffusion models for arbitrary (e.g., non-differentiable) objectives using reinforcement learning. GDPO is based on an eager policy gradient tailored for graph diffusion models, developed through meticulous analysis and promising improved performance. Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives. Code is available at https://github.com/sail-sg/GDPO.

Graph Diffusion Policy Optimization

TL;DR

Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives, and promising improved performance.

Abstract

Paper Structure (24 sections, 15 equations, 7 figures, 7 tables)

This paper contains 24 sections, 15 equations, 7 figures, 7 tables.

Introduction
Related Works
Preliminaries
Graph Diffusion Probabilistic Models
Markov Decision Process and Policy Gradient
Method
A Markov Decision Process Formulation
Learning Graph DPMs with Policy Gradient
Graph Diffusion Policy Optimization
Reward Functions for Graph Generation
Reward Functions for General Graph Generation
Reward Functions for Molecular Graph Generation
Experiments
General Graph Generation
Molecule Property Optimization
...and 9 more sections

Figures (7)

Figure 1: Overview of GDPO. (1) In each optimization step, GDPO samples multiple generation trajectories from the current Graph DPM and queries the reward function with different $\bm{G}_0$. (2) For each trajectory, GDPO accumulates the gradient $\nabla_\theta \log p_\theta(\bm{G}_0|\bm{G}_t)$ of each $(\bm{G}_0, \bm{G}_t)$ pair and assigns a weight to the aggregated gradient based on the corresponding reward signal. Finally, GDPO estimates the eager policy gradient by averaging the aggregated gradient from all trajectories.
Figure 2: Toy experiment comparing DDPO and GDPO. We generate connected graphs with increasing number of nodes. Node categories are disregarded, and the edge categories are binary, indicating whether two nodes are linked. The graph DPM is initialized randomly as a one-layer graph transformer from DiGress Vignac2022DiGressDD. The diffusion step $T$ is set to $50$, and the reward signal $r(\bm{G}_0)$ is defined as $1$ if $\bm{G}_0$ is connected and $0$ otherwise. We use $256$ trajectories for gradient estimation in each update. The learning curve illustrates the diminishing performance of DDPO as the number of nodes increases, while GDPO consistently performs well.
Figure 3: We investigate two key factors of GDPO on ZINC250k, with the target protein being 5ht1b. Similarly, the vertical axis represents the total queries, while the horizontal axis represents the average reward.(a) We vary the number of trajectories for gradient estimation. (b) We fix the weight of $r_{\textsc{QED}}$ and $r_{\textsc{SA}}$, and change the weight of $r_{\textsc{NOV}}$ while ensuring the total weight is 1.
Figure 4: Graph Diffusion Policy Optimization
Figure 5: We investigate the L2 distance between two consecutive steps in two types of DPMs. The diffusion step is 1000 for two models.
...and 2 more figures

Graph Diffusion Policy Optimization

TL;DR

Abstract

Graph Diffusion Policy Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)