Table of Contents
Fetching ...

Beyond Message Passing: A Symbolic Alternative for Expressive and Interpretable Graph Learning

Chuqin Geng, Li Zhang, Haolin Ye, Ziyu Zhao, Yuhe Jiang, Tara Saba, Xinyu Wang, Xujie Si

TL;DR

This paper introduces SymGraph, a standalone symbolic graph-learning framework that overcomes the 1-WL expressivity barrier by replacing neural message passing with discrete structural hashing and orbit-based, topology-aware aggregation. It couples local, structure-aware predicates with a global, discrete predicate-count representation to realize a transparent, Q-DNF style decision process, enabling interpretable rules aligned with domain knowledge such as SMARTS patterns. An evolutionary search with Master Tree Pre-Caching optimizes the semantic granularity of local predicates, yielding large CPU-only speedups (10×–100×) while maintaining or improving accuracy relative to state-of-the-art self-explainable GNNs. Empirically, SymGraph delivers state-of-the-art interpretability on molecular and synthetic graph tasks, with explanations that recover expert-validated motifs and provide finer semantic granularity than existing rule-based explainers, supporting scientific discovery and trustworthy AI in high-stakes domains.

Abstract

Graph Neural Networks (GNNs) have become essential in high-stakes domains such as drug discovery, yet their black-box nature remains a significant barrier to trustworthiness. While self-explainable GNNs attempt to bridge this gap, they often rely on standard message-passing backbones that inherit fundamental limitations, including the 1-Weisfeiler-Lehman (1-WL) expressivity barrier and a lack of fine-grained interpretability. To address these challenges, we propose SymGraph, a symbolic framework designed to transcend these constraints. By replacing continuous message passing with discrete structural hashing and topological role-based aggregation, our architecture theoretically surpasses the 1-WL barrier, achieving superior expressiveness without the overhead of differentiable optimization. Extensive empirical evaluations demonstrate that SymGraph achieves state-of-the-art performance, outperforming existing self-explainable GNNs. Notably, SymGraph delivers 10x to 100x speedups in training time using only CPU execution. Furthermore, SymGraph generates rules with superior semantic granularity compared to existing rule-based methods, offering great potential for scientific discovery and explainable AI.

Beyond Message Passing: A Symbolic Alternative for Expressive and Interpretable Graph Learning

TL;DR

This paper introduces SymGraph, a standalone symbolic graph-learning framework that overcomes the 1-WL expressivity barrier by replacing neural message passing with discrete structural hashing and orbit-based, topology-aware aggregation. It couples local, structure-aware predicates with a global, discrete predicate-count representation to realize a transparent, Q-DNF style decision process, enabling interpretable rules aligned with domain knowledge such as SMARTS patterns. An evolutionary search with Master Tree Pre-Caching optimizes the semantic granularity of local predicates, yielding large CPU-only speedups (10×–100×) while maintaining or improving accuracy relative to state-of-the-art self-explainable GNNs. Empirically, SymGraph delivers state-of-the-art interpretability on molecular and synthetic graph tasks, with explanations that recover expert-validated motifs and provide finer semantic granularity than existing rule-based explainers, supporting scientific discovery and trustworthy AI in high-stakes domains.

Abstract

Graph Neural Networks (GNNs) have become essential in high-stakes domains such as drug discovery, yet their black-box nature remains a significant barrier to trustworthiness. While self-explainable GNNs attempt to bridge this gap, they often rely on standard message-passing backbones that inherit fundamental limitations, including the 1-Weisfeiler-Lehman (1-WL) expressivity barrier and a lack of fine-grained interpretability. To address these challenges, we propose SymGraph, a symbolic framework designed to transcend these constraints. By replacing continuous message passing with discrete structural hashing and topological role-based aggregation, our architecture theoretically surpasses the 1-WL barrier, achieving superior expressiveness without the overhead of differentiable optimization. Extensive empirical evaluations demonstrate that SymGraph achieves state-of-the-art performance, outperforming existing self-explainable GNNs. Notably, SymGraph delivers 10x to 100x speedups in training time using only CPU execution. Furthermore, SymGraph generates rules with superior semantic granularity compared to existing rule-based methods, offering great potential for scientific discovery and explainable AI.
Paper Structure (55 sections, 3 theorems, 19 equations, 10 figures, 8 tables)

This paper contains 55 sections, 3 theorems, 19 equations, 10 figures, 8 tables.

Key Result

Proposition 3.3

The tuple predicate $p_j = (h_{j}, q_{j})$ distinguishes rooted subgraphs that are indistinguishable by the standard 1-WL test (and consequently standard MPNNs), specifically in cases where: (1) topological structures differ but generate identical 1-WL color refinements, or (2) topological structure

Figures (10)

  • Figure 1: Visualization of a Local Structural Predicate. The extracted predicate identifies specific node and edge constraints, providing superior semantic granularity compared to canonical subgraph-based explanations. See more details in Appendix \ref{['app:examples_read']}.
  • Figure 2: An overview of the SymGraph framework.
  • Figure 3: Baselines' explanations exhibit irrelevant rules and chemically invalid components.
  • Figure 4: SymGraph identifies expert-validated chemical motifs that provide ground-truth explanations for molecular properties. Our predicates enforce structural constraints on node and edge attributes, which are readily mapped to SMARTS patterns, as illustrated by the red highlights on the chemical compounds. This provides a direct bridge between learned logic and established domain knowledge.
  • Figure 5: Raw subgraph visualization via Node IDs. These figures display partially the raw output from the LogiX-GIN extraction process. The highlighted regions correspond to the internal node indices (IDs) identified in the DNF rules as critical for classification, prior to semantic mapping.
  • ...and 5 more figures

Theorems & Definitions (12)

  • Definition 3.1: Orbit-Aware Feature Vector $\mathbf{Z}_v$
  • Definition 3.2: Structure-Aware Predicate
  • Proposition 3.3
  • proof
  • Definition 3.4: Global Predicate Vocabulary $\mathcal{P}$
  • Definition 3.5: Predicate Count Vector $\mathbf{v}_G$
  • Proposition 3.6
  • proof
  • Theorem 3.1: Orbit Sorting Consistency
  • proof
  • ...and 2 more