A Survey of Data-Efficient Graph Learning
Wei Ju, Siyu Yi, Yifan Wang, Qingqing Long, Junyu Luo, Zhiping Xiao, Ming Zhang
TL;DR
This survey introduces Data-Efficient Graph Learning (DEGL) as a framework for performing graph learning with limited labeled data. It organizes existing work into three pillars self supervised, semi supervised, and few shot learning, and further divides each pillar into concrete methodological families such as generation based, contrastive based, and auxiliary property based self supervision; classical label propagation, consistency regularization, and pseudo labeling within semi supervised learning; and metric and optimization based approaches in few shot learning. The paper highlights representative methods, key objective formulations, and practical trade offs across these paradigms, while discussing domain shift, integration with large models, and non Euclidean convolution as future directions. By mapping diverse techniques to a coherent DEGL taxonomy, the work clarifies how researchers can design data efficient graph models that maintain performance with scarce annotations, enabling broader applicability in domains like biology and social networks. It also points to theoretical efforts to bound learnability and efficiency, suggesting a path toward robust, explainable, and scalable data efficient graph learning.
Abstract
Graph-structured data, prevalent in domains ranging from social networks to biochemical analysis, serve as the foundation for diverse real-world systems. While graph neural networks demonstrate proficiency in modeling this type of data, their success is often reliant on significant amounts of labeled data, posing a challenge in practical scenarios with limited annotation resources. To tackle this problem, tremendous efforts have been devoted to enhancing graph machine learning performance under low-resource settings by exploring various approaches to minimal supervision. In this paper, we introduce a novel concept of Data-Efficient Graph Learning (DEGL) as a research frontier, and present the first survey that summarizes the current progress of DEGL. We initiate by highlighting the challenges inherent in training models with large labeled data, paving the way for our exploration into DEGL. Next, we systematically review recent advances on this topic from several key aspects, including self-supervised graph learning, semi-supervised graph learning, and few-shot graph learning. Also, we state promising directions for future research, contributing to the evolution of graph machine learning.
