Table of Contents
Fetching ...

Data Augmentation on Graphs: A Technical Survey

Jiajun Zhou, Chenxuan Xie, Shengbo Gong, Zhenyu Wen, Xiangyu Zhao, Qi Xuan, Xiaoniu Yang

TL;DR

This survey addresses the challenges of improving graph data quality through data augmentation by proposing a six-scale taxonomy (feature, node, edge, subgraph, graph, and label levels) and detailing standardized definitions, mechanisms, and domain-specific techniques. It consolidates general GDAug methods and extends them to heterogeneous, temporal, spatio-temporal, and hypergraph graphs, while also presenting evaluation metrics, design guidelines, and practical applications at data and model levels. The work highlights open issues such as interpretability, scalability, and the need for comprehensive evaluation frameworks to compare augmentation strategies across tasks and domains. Collectively, the survey provides a unified, technically grounded reference to guide researchers and practitioners in designing effective GDAug pipelines and advancing graph representation learning.

Abstract

In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augmentation (GDAug) techniques. Specifically, this survey first provides an overview of various feasible taxonomies and categorizes existing GDAug studies based on multi-scale graph elements. Subsequently, for each type of GDAug technique, this survey formalizes standardized technical definition, discuss the technical details, and provide schematic illustration. The survey also reviews domain-specific graph data augmentation techniques, including those for heterogeneous graphs, temporal graphs, spatio-temporal graphs, and hypergraphs. In addition, this survey provides a summary of available evaluation metrics and design guidelines for graph data augmentation. Lastly, it outlines the applications of GDAug at both the data and model levels, discusses open issues in the field, and looks forward to future directions. The latest advances in GDAug are summarized in GitHub.

Data Augmentation on Graphs: A Technical Survey

TL;DR

This survey addresses the challenges of improving graph data quality through data augmentation by proposing a six-scale taxonomy (feature, node, edge, subgraph, graph, and label levels) and detailing standardized definitions, mechanisms, and domain-specific techniques. It consolidates general GDAug methods and extends them to heterogeneous, temporal, spatio-temporal, and hypergraph graphs, while also presenting evaluation metrics, design guidelines, and practical applications at data and model levels. The work highlights open issues such as interpretability, scalability, and the need for comprehensive evaluation frameworks to compare augmentation strategies across tasks and domains. Collectively, the survey provides a unified, technically grounded reference to guide researchers and practitioners in designing effective GDAug pipelines and advancing graph representation learning.

Abstract

In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augmentation (GDAug) techniques. Specifically, this survey first provides an overview of various feasible taxonomies and categorizes existing GDAug studies based on multi-scale graph elements. Subsequently, for each type of GDAug technique, this survey formalizes standardized technical definition, discuss the technical details, and provide schematic illustration. The survey also reviews domain-specific graph data augmentation techniques, including those for heterogeneous graphs, temporal graphs, spatio-temporal graphs, and hypergraphs. In addition, this survey provides a summary of available evaluation metrics and design guidelines for graph data augmentation. Lastly, it outlines the applications of GDAug at both the data and model levels, discusses open issues in the field, and looks forward to future directions. The latest advances in GDAug are summarized in GitHub.
Paper Structure (55 sections, 17 equations, 11 figures, 11 tables)

This paper contains 55 sections, 17 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: An overview framework of graph data augmentation in graph representation learning.
  • Figure 2: Illustration of feature shuffling augmentation.
  • Figure 3: Illustration of different feature masking augmentations.
  • Figure 4: Illustration of node dropping augmentation.
  • Figure 5: Illustration of generalized node mixup augmentation.
  • ...and 6 more figures