Table of Contents
Fetching ...

Self-supervised Subgraph Neural Network With Deep Reinforcement Walk Exploration

Jianming Huang, Hiroyuki Kasai

TL;DR

This work tackles the limited expressivity and interpretability of traditional GNNs by uniting subgraph neural networks with data-driven GNN explainers in a self-supervised framework. The authors introduce RWE-SGNN, which uses a reinforcement walk exploration (RWE) based MDP to efficiently generate informative substructures, paired with a two-stage training regime that alternates optimizing the sampling and output models under downstream losses. They prove that the walk-based generation has equivalent substructure-generation capability to conventional subgraph methods and demonstrate substantial gains in accuracy and explainability on multiple graph-classification benchmarks. The approach reduces computational complexity from quadratic to linear in the substructure generation step and provides tangible visual explanations for the extracted subgraphs, offering a practical pathway to more powerful and interpretable graph learning systems.

Abstract

Graph data, with its structurally variable nature, represents complex real-world phenomena like chemical compounds, protein structures, and social networks. Traditional Graph Neural Networks (GNNs) primarily utilize the message-passing mechanism, but their expressive power is limited and their prediction lacks explainability. To address these limitations, researchers have focused on graph substructures. Subgraph neural networks (SGNNs) and GNN explainers have emerged as potential solutions, but each has its limitations. SGNNs computes graph representations based on the bags of subgraphs to enhance the expressive power. However, they often rely on predefined algorithm-based sampling strategies, which is inefficient. GNN explainers adopt data-driven approaches to generate important subgraphs to provide explanation. Nevertheless, their explanation is difficult to be translated into practical improvements on GNNs. To overcome these issues, we propose a novel self-supervised framework that integrates SGNNs with the generation approach of GNN explainers, named the Reinforcement Walk Exploration SGNN (RWE-SGNN). Our approach features a sampling model trained in an explainer fashion, optimizing subgraphs to enhance model performance. To achieve a data-driven sampling approach, unlike traditional subgraph generation approaches, we propose a novel walk exploration process, which efficiently extracts important substructures, simplifying the embedding process and avoiding isomorphism problems. Moreover, we prove that our proposed walk exploration process has equivalent generation capability to the traditional subgraph generation process. Experimental results on various graph datasets validate the effectiveness of our proposed method, demonstrating significant improvements in performance and precision.

Self-supervised Subgraph Neural Network With Deep Reinforcement Walk Exploration

TL;DR

This work tackles the limited expressivity and interpretability of traditional GNNs by uniting subgraph neural networks with data-driven GNN explainers in a self-supervised framework. The authors introduce RWE-SGNN, which uses a reinforcement walk exploration (RWE) based MDP to efficiently generate informative substructures, paired with a two-stage training regime that alternates optimizing the sampling and output models under downstream losses. They prove that the walk-based generation has equivalent substructure-generation capability to conventional subgraph methods and demonstrate substantial gains in accuracy and explainability on multiple graph-classification benchmarks. The approach reduces computational complexity from quadratic to linear in the substructure generation step and provides tangible visual explanations for the extracted subgraphs, offering a practical pathway to more powerful and interpretable graph learning systems.

Abstract

Graph data, with its structurally variable nature, represents complex real-world phenomena like chemical compounds, protein structures, and social networks. Traditional Graph Neural Networks (GNNs) primarily utilize the message-passing mechanism, but their expressive power is limited and their prediction lacks explainability. To address these limitations, researchers have focused on graph substructures. Subgraph neural networks (SGNNs) and GNN explainers have emerged as potential solutions, but each has its limitations. SGNNs computes graph representations based on the bags of subgraphs to enhance the expressive power. However, they often rely on predefined algorithm-based sampling strategies, which is inefficient. GNN explainers adopt data-driven approaches to generate important subgraphs to provide explanation. Nevertheless, their explanation is difficult to be translated into practical improvements on GNNs. To overcome these issues, we propose a novel self-supervised framework that integrates SGNNs with the generation approach of GNN explainers, named the Reinforcement Walk Exploration SGNN (RWE-SGNN). Our approach features a sampling model trained in an explainer fashion, optimizing subgraphs to enhance model performance. To achieve a data-driven sampling approach, unlike traditional subgraph generation approaches, we propose a novel walk exploration process, which efficiently extracts important substructures, simplifying the embedding process and avoiding isomorphism problems. Moreover, we prove that our proposed walk exploration process has equivalent generation capability to the traditional subgraph generation process. Experimental results on various graph datasets validate the effectiveness of our proposed method, demonstrating significant improvements in performance and precision.

Paper Structure

This paper contains 13 sections, 1 theorem, 7 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Given a graph $G(\mathcal{V},\mathcal{E})$. For any connected subgraph $G^\prime(\mathcal{V}^\prime,\mathcal{E}^\prime)$ of $G$, let $\mathcal{W}_{G^\prime}$ denote the complete random walk set on $G^\prime$, which includes all possible random walk sequences of arbitrary length. There always exists

Figures (6)

  • Figure 1: Illustration of the self-supervised framework, where solid arrow represents the direction of data flow and dashed arrow represents the relationship of two models. The sampling model creates bag of substructures (nodes with red lines denote the substructure) for the output model, which can be considered an explainer of the output model. Conversely, the output model computes predictions, which is used to supervise the training of the sampling model.
  • Figure 2: Illustration of the sampling model of RWE-SGNN, which shows the process of extracting the important substructure from a graph. The red region highlights the important substructure area. Initially, the low-level graphlet perceptrons detect the key graphlets and assign graphlet-aware embeddings to all the nodes. Subsequently, with these graphlet-aware embeddings, the high-level reinforcement walk exploration framework extracts node of interest and generates a walk sequence specific to the important substructure area. Finally, this walk sequence is fed into a sequence encoder to compute embeddings for the bags of substructures.
  • Figure 3: Illustration of the model architecture, which comprises two main components: the sampling model and the output model. Initially, the input graph data are processed by the sampling model, which generates important substructures in a data-driven manner. Subsequently, leveraging the generated walk sequence, the output model encodes the extracted substructures for downstream tasks.
  • Figure 4: Test accuracy of walk-based MDP for varying trajectory lengths and varying sample numbers on the NCI109 dataset. The x-axis represents the number of epochs, while the y-axis corresponds to the classification accuracy (with a maximum value of $1$ denoting $100\%$).
  • Figure 5: The extracted subgraphs at different epochs in MUTAG dataset. The nodes in orange color denote the extracted nodes.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Subgraph optimization problem
  • Definition 2: Subgraph-generation-based MDP
  • Definition 3: Walk-exploration-based MDP
  • Theorem 1
  • proof