A Probabilistic Model for Node Classification in Directed Graphs
Diego Huerta, Gerardo Arizmendi
TL;DR
This work develops a probabilistic classifier for directed graphs with node attributes, enabling inductive label prediction for unseen nodes via ML or MAP estimation. The model explicitly specifies a generative process using parameters $\pi$, $\Theta$, $\Xi$ and conditional distributions $\psi_i$, $\phi_i$, and $\omega_i$, with interpretable terms derived from the first-order neighborhood and node attributes. It demonstrates competitive performance against neural baselines on two datasets—the Math Genealogy Project and ogbn-arxiv—while offering clear interpretability of each decision component. The authors also introduce a new MGp-derived dataset and provide comprehensive baselines and hyperparameter strategies, highlighting practical applicability and efficiency for large graphs with textual attributes.
Abstract
In this work, we present a probabilistic model for directed graphs where nodes have attributes and labels. This model serves as a generative classifier capable of predicting the labels of unseen nodes using either maximum likelihood or maximum a posteriori estimations. The predictions made by this model are highly interpretable, contrasting with some common methods for node classification, such as graph neural networks. We applied the model to two datasets, demonstrating predictive performance that is competitive with, and even superior to, state-of-the-art methods. One of the datasets considered is adapted from the Math Genealogy Project, which has not previously been utilized for this purpose. Consequently, we evaluated several classification algorithms on this dataset to compare the performance of our model and provide benchmarks for this new resource.
