Table of Contents
Fetching ...

GraphViz2Vec: A Structure-aware Feature Generation Model to Improve Classification in GNNs

Shraban Kumar Chatterjee, Suman Kundu

TL;DR

GraphViz2Vec addresses the deficiency of traditional GNN initial embeddings in capturing local graph structure and the tendency toward over-smoothing by introducing a structure-aware feature-generation pipeline. It combines random-walk induced subgraph projection, energy-based visualization via Kamada-Kawai layouts, and DenseNet-based image modeling to produce node embeddings that preserve neighborhood structure. The approach, decoupled from the GNN, enables state-of-the-art or near state-of-the-art performance across node and link classification tasks with only two GNN layers, reducing complexity and training requirements. Empirical results across diverse datasets and 12 GNN models demonstrate consistent improvements, with notable gains on several benchmarks and the ability to scale through batching and non-end-to-end feature extraction.

Abstract

GNNs are widely used to solve various tasks including node classification and link prediction. Most of the GNN architectures assume the initial embedding to be random or generated from popular distributions. These initial embeddings require multiple layers of transformation to converge into a meaningful latent representation. While number of layers allow accumulation of larger neighbourhood of a node it also introduce the problem of over-smoothing. In addition, GNNs are inept at representing structural information. For example, the output embedding of a node does not capture its triangles participation. In this paper, we presented a novel feature extraction methodology GraphViz2Vec that can capture the structural information of a node's local neighbourhood to create meaningful initial embeddings for a GNN model. These initial embeddings helps existing models achieve state-of-the-art results in various classification tasks. Further, these initial embeddings help the model to produce desired results with only two layers which in turn reduce the problem of over-smoothing. The initial encoding of a node is obtained from an image classification model trained on multiple energy diagrams of its local neighbourhood. These energy diagrams are generated with the induced sub-graph of the nodes traversed by multiple random walks. The generated encodings increase the performance of existing models on classification tasks (with a mean increase of $4.65\%$ and $2.58\%$ for the node and link classification tasks, respectively), with some models achieving state-of-the-art results.

GraphViz2Vec: A Structure-aware Feature Generation Model to Improve Classification in GNNs

TL;DR

GraphViz2Vec addresses the deficiency of traditional GNN initial embeddings in capturing local graph structure and the tendency toward over-smoothing by introducing a structure-aware feature-generation pipeline. It combines random-walk induced subgraph projection, energy-based visualization via Kamada-Kawai layouts, and DenseNet-based image modeling to produce node embeddings that preserve neighborhood structure. The approach, decoupled from the GNN, enables state-of-the-art or near state-of-the-art performance across node and link classification tasks with only two GNN layers, reducing complexity and training requirements. Empirical results across diverse datasets and 12 GNN models demonstrate consistent improvements, with notable gains on several benchmarks and the ability to scale through batching and non-end-to-end feature extraction.

Abstract

GNNs are widely used to solve various tasks including node classification and link prediction. Most of the GNN architectures assume the initial embedding to be random or generated from popular distributions. These initial embeddings require multiple layers of transformation to converge into a meaningful latent representation. While number of layers allow accumulation of larger neighbourhood of a node it also introduce the problem of over-smoothing. In addition, GNNs are inept at representing structural information. For example, the output embedding of a node does not capture its triangles participation. In this paper, we presented a novel feature extraction methodology GraphViz2Vec that can capture the structural information of a node's local neighbourhood to create meaningful initial embeddings for a GNN model. These initial embeddings helps existing models achieve state-of-the-art results in various classification tasks. Further, these initial embeddings help the model to produce desired results with only two layers which in turn reduce the problem of over-smoothing. The initial encoding of a node is obtained from an image classification model trained on multiple energy diagrams of its local neighbourhood. These energy diagrams are generated with the induced sub-graph of the nodes traversed by multiple random walks. The generated encodings increase the performance of existing models on classification tasks (with a mean increase of and for the node and link classification tasks, respectively), with some models achieving state-of-the-art results.
Paper Structure (22 sections, 4 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 4 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Proposed Model.
  • Figure 2: Performance of the Models on the Cora Dataset on changing the size of the Random Walk. Here, RW@16 denotes a random walk of size 16.
  • Figure 3: Time(in hrs) spent on training by image model for each dataset. We have taken one image per node in this case. The labels on each bar denote the number of nodes in that dataset.
  • Figure 4: Change in Accuracy(a) and Loss(b) through the increase in epochs with different hidden feature sizes for all the $2$ layer models on the Cora Dataset (the images are generated from a random walk of size 128).
  • Figure 5: Time(in seconds) taken by the different models for different hidden states when trained on the generated features to reach 600 epochs

Theorems & Definitions (3)

  • Remark 2.1
  • Definition 1: Node Classification
  • Definition 2: Link Classification