Controllable Graph Generation with Diffusion Models via Inference-Time Tree Search Guidance
Jiachi Zhao, Zehong Wang, Yamei Liao, Chuxu Zhang, Yanfang Ye
TL;DR
TreeDiff addresses the challenge of controllable graph generation with diffusion models by introducing an inference-time, planning-based framework that uses Monte Carlo Tree Search to navigate long denoising trajectories. It combines three innovations—macro-step expansion to reduce search depth, dual-space denoising to couple latent efficiency with graph-level fidelity, and a dual-space verifier to provide rollout-free long-horizon value estimates. Empirical results on 2D and 3D molecular generation show state-of-the-art performance in both conditional and unconditional settings, with robust inference-time scaling and improved stability over existing inference-time guidance methods. This approach enables multi-objective, design-driven graph generation without retraining, with clear benefits for drug discovery and materials science where structured validity and property alignment are crucial.
Abstract
Graph generation is a fundamental problem in graph learning with broad applications across Web-scale systems, knowledge graphs, and scientific domains such as drug and material discovery. Recent approaches leverage diffusion models for step-by-step generation, yet unconditional diffusion offers little control over desired properties, often leading to unstable quality and difficulty in incorporating new objectives. Inference-time guidance methods mitigate these issues by adjusting the sampling process without retraining, but they remain inherently local, heuristic, and limited in controllability. To overcome these limitations, we propose TreeDiff, a Monte Carlo Tree Search (MCTS) guided dual-space diffusion framework for controllable graph generation. TreeDiff is a plug-and-play inference-time method that expands the search space while keeping computation tractable. Specifically, TreeDiff introduces three key designs to make it practical and scalable: (1) a macro-step expansion strategy that groups multiple denoising updates into a single transition, reducing tree depth and enabling long-horizon exploration; (2) a dual-space denoising mechanism that couples efficient latent-space denoising with lightweight discrete correction in graph space, ensuring both scalability and structural fidelity; and (3) a dual-space verifier that predicts long-term rewards from partially denoised graphs, enabling early value estimation and removing the need for full rollouts. Extensive experiments on 2D and 3D molecular generation benchmarks, under both unconditional and conditional settings, demonstrate that TreeDiff achieves state-of-the-art performance. Notably, TreeDiff exhibits favorable inference-time scaling: it continues to improve with additional computation, while existing inference-time methods plateau early under limited resources.
