Table of Contents
Fetching ...

A Self-Explainable Heterogeneous GNN for Relational Deep Learning

Francesco Ferrini, Antonio Longa, Andrea Passerini, Manfred Jaeger

TL;DR

This work addresses learning informative meta-paths in heterogeneous graphs derived from relational databases, where class labels depend on aggregate statistics across multiple meta-path occurrences. It introduces Meta-Path Statistics GNN (MPS-GNN), a self-explainable approach that greedily constructs meta-paths via a local surrogate multi-instance objective and a relation-scoring loss, then trains a skip-connected multi-relational GNN along the learned paths. By using counts-of-counts and related statistics, MPS-GNN achieves superior predictive performance on synthetic and real-world relational databases while providing faithful, meta-path–based explanations. The method scales linearly with the number of relations and yields interpretable meta-paths that align with domain insights, making it particularly suitable for relational deep learning tasks in complex databases.

Abstract

Recently, significant attention has been given to the idea of viewing relational databases as heterogeneous graphs, enabling the application of graph neural network (GNN) technology for predictive tasks. However, existing GNN methods struggle with the complexity of the heterogeneous graphs induced by databases with numerous tables and relations. Traditional approaches either consider all possible relational meta-paths, thus failing to scale with the number of relations, or rely on domain experts to identify relevant meta-paths. A recent solution does manage to learn informative meta-paths without expert supervision, but assumes that a node's class depends solely on the existence of a meta-path occurrence. In this work, we present a self-explainable heterogeneous GNN for relational data, that supports models in which class membership depends on aggregate information obtained from multiple occurrences of a meta-path. Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model's reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenario.

A Self-Explainable Heterogeneous GNN for Relational Deep Learning

TL;DR

This work addresses learning informative meta-paths in heterogeneous graphs derived from relational databases, where class labels depend on aggregate statistics across multiple meta-path occurrences. It introduces Meta-Path Statistics GNN (MPS-GNN), a self-explainable approach that greedily constructs meta-paths via a local surrogate multi-instance objective and a relation-scoring loss, then trains a skip-connected multi-relational GNN along the learned paths. By using counts-of-counts and related statistics, MPS-GNN achieves superior predictive performance on synthetic and real-world relational databases while providing faithful, meta-path–based explanations. The method scales linearly with the number of relations and yields interpretable meta-paths that align with domain insights, making it particularly suitable for relational deep learning tasks in complex databases.

Abstract

Recently, significant attention has been given to the idea of viewing relational databases as heterogeneous graphs, enabling the application of graph neural network (GNN) technology for predictive tasks. However, existing GNN methods struggle with the complexity of the heterogeneous graphs induced by databases with numerous tables and relations. Traditional approaches either consider all possible relational meta-paths, thus failing to scale with the number of relations, or rely on domain experts to identify relevant meta-paths. A recent solution does manage to learn informative meta-paths without expert supervision, but assumes that a node's class depends solely on the existence of a meta-path occurrence. In this work, we present a self-explainable heterogeneous GNN for relational data, that supports models in which class membership depends on aggregate information obtained from multiple occurrences of a meta-path. Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model's reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenario.

Paper Structure

This paper contains 36 sections, 12 equations, 10 figures, 14 tables, 1 algorithm.

Figures (10)

  • Figure 1: Left: Relational database schema for a medical domain. Right: Heterogeneous graph representation of (part of) the database. The highlighted subgraph shows a prototypical counts-of-counts pattern characterising positive patients, namely having at least two exempt prescriptions (represented by node feature T), each containing at least two medications. Existing heterogeneous GNNs struggle with these patterns as they need to learn a separate weight matrix for each edge type in the graph, while MPS-GNN is capable of learning the relevant meta path without any direct user supervision.
  • Figure 2: Outline of greedy local meta-path construction
  • Figure 3: Scoring the first two relations
  • Figure 4: Bag generation and scoring of relation $c$.
  • Figure 5: Scoring relation $d$.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Definition 3.1: Relational Database
  • Definition 3.2: Heterogeneous graph
  • Definition 3.3: Meta-path