Table of Contents
Fetching ...

Data-centric Graph Learning: A Survey

Yuxin Guo, Deyu Bo, Cheng Yang, Zhiyuan Lu, Zhongjian Zhang, Jixi Liu, Yufei Peng, Chuan Shi

TL;DR

A novel taxonomy based on the stages in the graph learning pipeline is proposed, and the processing methods for different data structures in the graph data are highlighted, i.e., topology, feature and label.

Abstract

The history of artificial intelligence (AI) has witnessed the significant impact of high-quality data on various deep learning models, such as ImageNet for AlexNet and ResNet. Recently, instead of designing more complex neural architectures as model-centric approaches, the attention of AI community has shifted to data-centric ones, which focuses on better processing data to strengthen the ability of neural models. Graph learning, which operates on ubiquitous topological data, also plays an important role in the era of deep learning. In this survey, we comprehensively review graph learning approaches from the data-centric perspective, and aim to answer three crucial questions: (1) when to modify graph data, (2) what part of the graph data needs modification to unlock the potential of various graph models, and (3) how to safeguard graph models from problematic data influence. Accordingly, we propose a novel taxonomy based on the stages in the graph learning pipeline, and highlight the processing methods for different data structures in the graph data, i.e., topology, feature and label. Furthermore, we analyze some potential problems embedded in graph data and discuss how to solve them in a data-centric manner. Finally, we provide some promising future directions for data-centric graph learning.

Data-centric Graph Learning: A Survey

TL;DR

A novel taxonomy based on the stages in the graph learning pipeline is proposed, and the processing methods for different data structures in the graph data are highlighted, i.e., topology, feature and label.

Abstract

The history of artificial intelligence (AI) has witnessed the significant impact of high-quality data on various deep learning models, such as ImageNet for AlexNet and ResNet. Recently, instead of designing more complex neural architectures as model-centric approaches, the attention of AI community has shifted to data-centric ones, which focuses on better processing data to strengthen the ability of neural models. Graph learning, which operates on ubiquitous topological data, also plays an important role in the era of deep learning. In this survey, we comprehensively review graph learning approaches from the data-centric perspective, and aim to answer three crucial questions: (1) when to modify graph data, (2) what part of the graph data needs modification to unlock the potential of various graph models, and (3) how to safeguard graph models from problematic data influence. Accordingly, we propose a novel taxonomy based on the stages in the graph learning pipeline, and highlight the processing methods for different data structures in the graph data, i.e., topology, feature and label. Furthermore, we analyze some potential problems embedded in graph data and discuss how to solve them in a data-centric manner. Finally, we provide some promising future directions for data-centric graph learning.
Paper Structure (39 sections, 9 equations, 1 figure, 1 table)

This paper contains 39 sections, 9 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Pipeline of data-centric graph learning. The first step is to construct different graphs from the data as needed. The graph structure, node features, or labels are then pre-processed to facilitate the learning of graph models. During the training phase, graph data is collaboratively processed with the task-solving model to improve its performance. Ultimately, prompts are designed to imbue the graph models with enhanced predictive capabilities during the inference stage.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2