Table of Contents
Fetching ...

Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs

Sumeyye Bas, Kiymet Kaya, Resul Tugay, Sule Gunduz Oguducu

TL;DR

This paper tackles data scarcity and privacy concerns in graph-based predictive modeling by proposing a size-aware data augmentation framework that blends real and synthetic graphs. The approach trains per-class generators and selects GRAN for large graphs and GraphRNN for small graphs to generate synthetic graphs, which are used alongside real graphs to augment training. Experiments across six public datasets demonstrate that incorporating synthetic data improves graph classification performance, with GRAN excelling on larger graphs and GraphRNN on smaller ones, and with balanced augmentation mitigating data scarcity and imbalance. The work provides a practical, scalable pathway to enhance GNN performance and suggests directions for explainability and dynamic-graph extensions in future research.

Abstract

Graphs are crucial for representing interrelated data and aiding predictive modeling by capturing complex relationships. Achieving high-quality graph representation is important for identifying linked patterns, leading to improvements in Graph Neural Networks (GNNs) to better capture data structures. However, challenges such as data scarcity, high collection costs, and ethical concerns limit progress. As a result, generative models and data augmentation have become more and more popular. This study explores using generated graphs for data augmentation, comparing the performance of combining generated graphs with real graphs, and examining the effect of different quantities of generated graphs on graph classification tasks. The experiments show that balancing scalability and quality requires different generators based on graph size. Our results introduce a new approach to graph data augmentation, ensuring consistent labels and enhancing classification performance.

Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs

TL;DR

This paper tackles data scarcity and privacy concerns in graph-based predictive modeling by proposing a size-aware data augmentation framework that blends real and synthetic graphs. The approach trains per-class generators and selects GRAN for large graphs and GraphRNN for small graphs to generate synthetic graphs, which are used alongside real graphs to augment training. Experiments across six public datasets demonstrate that incorporating synthetic data improves graph classification performance, with GRAN excelling on larger graphs and GraphRNN on smaller ones, and with balanced augmentation mitigating data scarcity and imbalance. The work provides a practical, scalable pathway to enhance GNN performance and suggests directions for explainability and dynamic-graph extensions in future research.

Abstract

Graphs are crucial for representing interrelated data and aiding predictive modeling by capturing complex relationships. Achieving high-quality graph representation is important for identifying linked patterns, leading to improvements in Graph Neural Networks (GNNs) to better capture data structures. However, challenges such as data scarcity, high collection costs, and ethical concerns limit progress. As a result, generative models and data augmentation have become more and more popular. This study explores using generated graphs for data augmentation, comparing the performance of combining generated graphs with real graphs, and examining the effect of different quantities of generated graphs on graph classification tasks. The experiments show that balancing scalability and quality requires different generators based on graph size. Our results introduce a new approach to graph data augmentation, ensuring consistent labels and enhancing classification performance.
Paper Structure (11 sections, 10 equations, 2 figures, 3 tables)

This paper contains 11 sections, 10 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Graph Classification with Graph Size-Aware Data Augmentation.
  • Figure 2: Proposed Graph Classification with Graph Size-aware Data Augmentation Framework Results Summary