A Self-Explainable Heterogeneous GNN for Relational Deep Learning
Francesco Ferrini, Antonio Longa, Andrea Passerini, Manfred Jaeger
TL;DR
This work addresses learning informative meta-paths in heterogeneous graphs derived from relational databases, where class labels depend on aggregate statistics across multiple meta-path occurrences. It introduces Meta-Path Statistics GNN (MPS-GNN), a self-explainable approach that greedily constructs meta-paths via a local surrogate multi-instance objective and a relation-scoring loss, then trains a skip-connected multi-relational GNN along the learned paths. By using counts-of-counts and related statistics, MPS-GNN achieves superior predictive performance on synthetic and real-world relational databases while providing faithful, meta-path–based explanations. The method scales linearly with the number of relations and yields interpretable meta-paths that align with domain insights, making it particularly suitable for relational deep learning tasks in complex databases.
Abstract
Recently, significant attention has been given to the idea of viewing relational databases as heterogeneous graphs, enabling the application of graph neural network (GNN) technology for predictive tasks. However, existing GNN methods struggle with the complexity of the heterogeneous graphs induced by databases with numerous tables and relations. Traditional approaches either consider all possible relational meta-paths, thus failing to scale with the number of relations, or rely on domain experts to identify relevant meta-paths. A recent solution does manage to learn informative meta-paths without expert supervision, but assumes that a node's class depends solely on the existence of a meta-path occurrence. In this work, we present a self-explainable heterogeneous GNN for relational data, that supports models in which class membership depends on aggregate information obtained from multiple occurrences of a meta-path. Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model's reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenario.
