Graph VQ-Transformer (GVT): Fast and Accurate Molecular Generation via High-Fidelity Discrete Latents
Haozhuo Zheng, Cheng Wang, Yang Liu
TL;DR
GVT introduces a two-stage molecular generator that first learns high-fidelity discrete latent representations of molecular graphs via a Graph VQ-VAE, then trains an autoregressive Transformer on these latents to generate new molecules. The decoder leverages Reverse Cuthill-McKee canonical ordering and Rotary Position Embeddings to achieve near-perfect graph reconstruction, addressing structural ambiguity. On benchmarks like QM9, ZINC250k, MOSES, and GuacaMol, GVT attains state-of-the-art or competitive performance, with strong distribution-similarity metrics (FCD, KL) and significantly faster generation than diffusion methods. By reframing graph generation as discrete latent sequence modeling, GVT provides a scalable, efficient alternative that aligns molecular design with large-scale language-model paradigms and sets a strong baseline for future discrete-latent molecular generation.
Abstract
The de novo generation of molecules with desirable properties is a critical challenge, where diffusion models are computationally intensive and autoregressive models struggle with error propagation. In this work, we introduce the Graph VQ-Transformer (GVT), a two-stage generative framework that achieves both high accuracy and efficiency. The core of our approach is a novel Graph Vector Quantized Variational Autoencoder (VQ-VAE) that compresses molecular graphs into high-fidelity discrete latent sequences. By synergistically combining a Graph Transformer with canonical Reverse Cuthill-McKee (RCM) node ordering and Rotary Positional Embeddings (RoPE), our VQ-VAE achieves near-perfect reconstruction rates. An autoregressive Transformer is then trained on these discrete latents, effectively converting graph generation into a well-structured sequence modeling problem. Crucially, this mapping of complex graphs to high-fidelity discrete sequences bridges molecular design with the powerful paradigm of large-scale sequence modeling, unlocking potential synergies with Large Language Models (LLMs). Extensive experiments show that GVT achieves state-of-the-art or highly competitive performance across major benchmarks like ZINC250k, MOSES, and GuacaMol, and notably outperforms leading diffusion models on key distribution similarity metrics such as FCD and KL Divergence. With its superior performance, efficiency, and architectural novelty, GVT not only presents a compelling alternative to diffusion models but also establishes a strong new baseline for the field, paving the way for future research in discrete latent-space molecular generation.
