Table of Contents
Fetching ...

On the Power of Graph Neural Networks and Feature Augmentation Strategies to Classify Social Networks

Walid Guettala, László Gulyás

TL;DR

This work investigates graph classification on synthetic social networks by evaluating four GNN architectures (GCN with Hierarchical and Global pooling, GIN, GATv2) across five node-feature augmentation strategies (Ones, Noise, Degree, Norm Degree, Identity). All models operate on featureless graphs augmented with these artificial features and are trained with a shared classifier head, across a grid of hidden dimensions $H$, to distinguish eight network families generated by classic Network Science models. The study finds that high-capacity architectures (GIN, GATv2) generally perform well across augmentations, while informative features like Identity and Degree provide the strongest gains and can compensate for lower model complexity; Hierarchical pooling underperforms relative to other configurations. The results highlight a balance between architectural power and feature informativeness, with implications for graph classification on real networks after training on diverse synthetic benchmarks.

Abstract

This paper studies four Graph Neural Network architectures (GNNs) for a graph classification task on a synthetic dataset created using classic generative models of Network Science. Since the synthetic networks do not contain (node or edge) features, five different augmentation strategies (artificial feature types) are applied to nodes. All combinations of the 4 GNNs (GCN with Hierarchical and Global aggregation, GIN and GATv2) and the 5 feature types (constant 1, noise, degree, normalized degree and ID -- a vector of the number of cycles of various lengths) are studied and their performances compared as a function of the hidden dimension of artificial neural networks used in the GNNs. The generalisation ability of these models is also analysed using a second synthetic network dataset (containing networks of different sizes).Our results point towards the balanced importance of the computational power of the GNN architecture and the the information level provided by the artificial features. GNN architectures with higher computational power, like GIN and GATv2, perform well for most augmentation strategies. On the other hand, artificial features with higher information content, like ID or degree, not only consistently outperform other augmentation strategies, but can also help GNN architectures with lower computational power to achieve good performance.

On the Power of Graph Neural Networks and Feature Augmentation Strategies to Classify Social Networks

TL;DR

This work investigates graph classification on synthetic social networks by evaluating four GNN architectures (GCN with Hierarchical and Global pooling, GIN, GATv2) across five node-feature augmentation strategies (Ones, Noise, Degree, Norm Degree, Identity). All models operate on featureless graphs augmented with these artificial features and are trained with a shared classifier head, across a grid of hidden dimensions , to distinguish eight network families generated by classic Network Science models. The study finds that high-capacity architectures (GIN, GATv2) generally perform well across augmentations, while informative features like Identity and Degree provide the strongest gains and can compensate for lower model complexity; Hierarchical pooling underperforms relative to other configurations. The results highlight a balance between architectural power and feature informativeness, with implications for graph classification on real networks after training on diverse synthetic benchmarks.

Abstract

This paper studies four Graph Neural Network architectures (GNNs) for a graph classification task on a synthetic dataset created using classic generative models of Network Science. Since the synthetic networks do not contain (node or edge) features, five different augmentation strategies (artificial feature types) are applied to nodes. All combinations of the 4 GNNs (GCN with Hierarchical and Global aggregation, GIN and GATv2) and the 5 feature types (constant 1, noise, degree, normalized degree and ID -- a vector of the number of cycles of various lengths) are studied and their performances compared as a function of the hidden dimension of artificial neural networks used in the GNNs. The generalisation ability of these models is also analysed using a second synthetic network dataset (containing networks of different sizes).Our results point towards the balanced importance of the computational power of the GNN architecture and the the information level provided by the artificial features. GNN architectures with higher computational power, like GIN and GATv2, perform well for most augmentation strategies. On the other hand, artificial features with higher information content, like ID or degree, not only consistently outperform other augmentation strategies, but can also help GNN architectures with lower computational power to achieve good performance.
Paper Structure (9 sections, 3 figures, 4 tables)

This paper contains 9 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The generic structure of the models studied. The input graphs have no feature information, so nodes are augmented by artificial features (varying across the studies). The resulting network is passed through one of the recent graph embedding architectures (GNNs, also varying across the studies) and then classified into one of the 8 classes defined in Section \ref{['sec:dataset']}.
  • Figure 2: The high-level architectures of the 4 graph embedding architectures studied. GIN GIN is on the far left, followed by GATv2 GATv2. The Hierarchical and Global architectures are on the right (in this order). Each architecture is depicted with $K=2$ layers, where $K$ is a hyperparameter that is also explored in the study. The output of the embedding is to be fed to the classification head. BN denotes Batch normalization, Readout stands for the output of the pooling, while Linear means a single layer of neurons). The meaning of GCN, GIN, GATv2 layers and Graph Pooling is described in the text.
  • Figure 3: Performance of the various augmentation strategies and embedding architectures as a function of $H$ for both the testing subset of the Small Dataset (used for training) and for the Medium Dataset (to assess generalisation ability). The GIN and GATv2 architectures perform well for most feature types and also generalise acceptably. The Global architecture only shows good accuracy at larger hidden dimensions and generalises only for some feature types. The Hierarchical architecture is poor both in terms of performance and generalisation. The panels report on the GIN (left) and GATv2 (right) architectures in the top row and on the Global (left) and Hierarchical (right) architectures at the bottom. The accuracies obtained for the Small Dataset are plotted in black, while the results for the Medium Dataset are marked with gray and dashed lines. Augmentation strategies are distinguished by different markers -- the same marker is used for the same feature type in case of both datasets. The accuracy values shown are averages over 5 trials with independent random weight initialisations, each obtained with $K=4$, a batch size of 100 and a dropout rate of $0.5$, using the ADAM optimizer for 100 epochs with a weight decay of $10^{-3}$ and a learning rate of 0.01.