Table of Contents
Fetching ...

Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization

Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Magdalena Ortiz, Matias Selin, Mantas Šimkus

TL;DR

The paper tackles efficient graph classification by converting graphs into fixed-length tabular features derived from logic-guided Weisfeiler–Leman variants. It introduces a generalized WL framework, \mathcal{Q}-WL, together with PL(\mathcal{Q}) and a generalized bisimulation, to characterize expressive power. The proposed \mathcal{Q}-WL-RF pipeline converts node-type frequencies into tabular data and trains random forests, achieving competitive accuracy with Graph Transformers and GNNs while delivering 40–60x speedups and lower memory. The work validates across 14 diverse benchmarks and outlines future directions for richer quantifiers and alternative tabular learners.

Abstract

We present a novel approach for graph classification based on tabularizing graph data via new variants of the Weisfeiler-Leman algorithm and then applying methods for tabular data. We investigate a comprehensive class of versions of the Weisfeiler-Leman algorithm obtained by modifying the underlying logical framework and establish a precise theoretical characterization of their expressive power. We then test selected versions on 14 benchmark datasets that span a range of application domains. The experiments demonstrate that our approach generally achieves better predictive performance than graph neural networks and matches that of graph transformers, while being 40-60x faster and requiring neither a GPU nor extensive hyperparameter tuning.

Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization

TL;DR

The paper tackles efficient graph classification by converting graphs into fixed-length tabular features derived from logic-guided Weisfeiler–Leman variants. It introduces a generalized WL framework, \mathcal{Q}-WL, together with PL(\mathcal{Q}) and a generalized bisimulation, to characterize expressive power. The proposed \mathcal{Q}-WL-RF pipeline converts node-type frequencies into tabular data and trains random forests, achieving competitive accuracy with Graph Transformers and GNNs while delivering 40–60x speedups and lower memory. The work validates across 14 diverse benchmarks and outlines future directions for richer quantifiers and alternative tabular learners.

Abstract

We present a novel approach for graph classification based on tabularizing graph data via new variants of the Weisfeiler-Leman algorithm and then applying methods for tabular data. We investigate a comprehensive class of versions of the Weisfeiler-Leman algorithm obtained by modifying the underlying logical framework and establish a precise theoretical characterization of their expressive power. We then test selected versions on 14 benchmark datasets that span a range of application domains. The experiments demonstrate that our approach generally achieves better predictive performance than graph neural networks and matches that of graph transformers, while being 40-60x faster and requiring neither a GPU nor extensive hyperparameter tuning.

Paper Structure

This paper contains 18 sections, 10 theorems, 27 equations, 4 figures, 10 tables.

Key Result

Proposition 3

Let $(\mathfrak{M},w)$ and $(\mathfrak{N},w')$ be pointed Kripke models. The graded-bisimulation game is sound and complete in the sense that the following are equivalent:

Figures (4)

  • Figure 1: Comparison of three graph classification paradigms. All methods begin by transforming node labels based on neighborhood structure: The Weisfeiler--Leman Graph Kernel uses WL, GNNs/Graph Transformers use message passing or attention mechanisms, and $\mathcal{Q}$-WL-RF uses $\mathcal{Q}$-WL for some set of quantifiers $\mathcal{Q}$. After transformation, Graph Kernels compute pairwise similarities and classify with an SVM; GNNs/Graph Transformers pool node embeddings into graph-level representations and classify with an MLP; $\mathcal{Q}$-WL-RF counts the frequency of each $\mathcal{Q}$-modal type to produce a tabular dataset and classifies with a random forest.
  • Figure 2: Critical difference diagram comparing method test performance demvsar2006statistical. The nine methods are listed by mean rank, lower being better. Methods with no statistically significant difference, as determined by the Nemenyi test with significance $\alpha=0.05$, are connected with a bar.
  • Figure 2: Mean runtime and peak memory consumption across methods. Runtimes measure time used for hyperparameter optimization, training and inference per train/validation/test split.
  • Figure 3: Memory consumption of (Full) $\mathcal{Q}$-WL-RF and WL-GK as a function of dataset size. Unlike the graph kernel, our method remains efficient even on large datasets.

Theorems & Definitions (23)

  • Definition 1
  • Example 2
  • Proposition 3
  • Example 4
  • Theorem 5
  • proof : Proof sketch.
  • Example 6
  • Lemma 7
  • proof
  • Lemma 8
  • ...and 13 more