Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks
Chaojie Wang, Xinyang Liu, Dongsheng Wang, Hao Zhang, Bo Chen, Mingyuan Zhou
TL;DR
The paper addresses the limitations of Gaussian-parameterized VGAEs in modeling document relational networks by introducing a non-Gaussian, deep RTM-based decoder (GPFA/GPGBN) and pairing it with Weibull-based encoders to form Weibull Graph Autoencoders (WGAEs). The GPFA provides analytic posteriors for joint modeling of node features and links, while GPGBN extends this to a multi-layer hierarchical RTM that captures hierarchical semantic topics and multilevel relationships. Two encoder variants—a vanilla graph convolutional encoder (WGCAE) and a Bayesian attention-based encoder (WGAAE)—are integrated with the GPGBN decoder, and training supports both full-batch and scalable mini-batch regimes, including a subgraph decoding strategy to reduce cost. Experimental results demonstrate stronger hierarchical latent representations and competitive performance on link prediction, clustering, and classification tasks, with scalability to large graphs such as MAG240M. Overall, the work offers a scalable, interpretable framework for multilevel DRN analysis that combines deep RTMs with flexible, non-Gaussian variational inference.
Abstract
Although existing variational graph autoencoders (VGAEs) have been widely used for modeling and generating graph-structured data, most of them are still not flexible enough to approximate the sparse and skewed latent node representations, especially those of document relational networks (DRNs) with discrete observations. To analyze a collection of interconnected documents, a typical branch of Bayesian models, specifically relational topic models (RTMs), has proven their efficacy in describing both link structures and document contents of DRNs, which motives us to incorporate RTMs with existing VGAEs to alleviate their potential issues when modeling the generation of DRNs. In this paper, moving beyond the sophisticated approximate assumptions of traditional RTMs, we develop a graph Poisson factor analysis (GPFA), which provides analytic conditional posteriors to improve the inference accuracy, and extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels. Then, taking GPGBN as the decoder, we combine it with various Weibull-based graph inference networks, resulting in two variants of Weibull graph auto-encoder (WGAE), equipped with model inference algorithms. Experimental results demonstrate that our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
