Table of Contents
Fetching ...

BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning

Jihoon Ko, Shinhwan Kang, Taehyung Kwon, Heechan Moon, Kijung Shin

TL;DR

This paper defines four standard incremental settings (task-, class-, domain-, and time-incremental) for node-, link-, and graph-level problems, extending the previously explored scope and develops BeGin, an easy and fool-proof framework for graph CL.

Abstract

Continual Learning (CL) is the process of learning ceaselessly a sequence of tasks. Most existing CL methods deal with independent data (e.g., images and text) for which many benchmark frameworks and results under standard experimental settings are available. Compared to them, however, CL methods for graph data (graph CL) are relatively underexplored because of (a) the lack of standard experimental settings, especially regarding how to deal with the dependency between instances, (b) the lack of benchmark datasets and scenarios, and (c) high complexity in implementation and evaluation due to the dependency. In this paper, regarding (a) we define four standard incremental settings (task-, class-, domain-, and time-incremental) for node-, link-, and graph-level problems, extending the previously explored scope. Regarding (b), we provide 35 benchmark scenarios based on 24 real-world graphs. Regarding (c), we develop BeGin, an easy and fool-proof framework for graph CL. BeGin is easily extended since it is modularized with reusable modules for data processing, algorithm design, and evaluation. Especially, the evaluation module is completely separated from user code to eliminate potential mistakes. Regarding benchmark results, we cover 3x more combinations of incremental settings and levels of problems than the latest benchmark. All assets for the benchmark framework are publicly available at https://github.com/ShinhwanKang/BeGin.

BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning

TL;DR

This paper defines four standard incremental settings (task-, class-, domain-, and time-incremental) for node-, link-, and graph-level problems, extending the previously explored scope and develops BeGin, an easy and fool-proof framework for graph CL.

Abstract

Continual Learning (CL) is the process of learning ceaselessly a sequence of tasks. Most existing CL methods deal with independent data (e.g., images and text) for which many benchmark frameworks and results under standard experimental settings are available. Compared to them, however, CL methods for graph data (graph CL) are relatively underexplored because of (a) the lack of standard experimental settings, especially regarding how to deal with the dependency between instances, (b) the lack of benchmark datasets and scenarios, and (c) high complexity in implementation and evaluation due to the dependency. In this paper, regarding (a) we define four standard incremental settings (task-, class-, domain-, and time-incremental) for node-, link-, and graph-level problems, extending the previously explored scope. Regarding (b), we provide 35 benchmark scenarios based on 24 real-world graphs. Regarding (c), we develop BeGin, an easy and fool-proof framework for graph CL. BeGin is easily extended since it is modularized with reusable modules for data processing, algorithm design, and evaluation. Especially, the evaluation module is completely separated from user code to eliminate potential mistakes. Regarding benchmark results, we cover 3x more combinations of incremental settings and levels of problems than the latest benchmark. All assets for the benchmark framework are publicly available at https://github.com/ShinhwanKang/BeGin.
Paper Structure (31 sections, 2 equations, 12 figures, 13 tables)

This paper contains 31 sections, 2 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Examples of graph continual learning problems.
  • Figure 2: Motivating data analysis: temporal dynamics in real-world graphs. In the $\mathsf{ogbn}$-$\mathsf{arxiv}$ dataset (see Section \ref{['sec:scenarios:examples']} for details), we observe a gradual increase over time in four aspects (a)-(d). Additionally, the class distribution varies over time, accompanied by the emergence of new classes, as depicted in (e). Moreover, in (f), where t-SNE van2008visualizing is used for dimensionality reduction, the average node features exhibit shifts over time, as indicated by the directional arrows.
  • Figure 3: Example communications between the trainer (user code) and the loader.
  • Figure 4: (Left) Modularized structure of BeGin, our proposed benchmark framework for implementation and evaluation of continual learning methods for graph data. (Right) An example implementation of EWC with BeGin. To implement and benchmark new graph CL methods, users only need to fill out the modularized event functions in the trainer, which then proceeds the training procedure with the event functions. We provide detailed explanations for implementing EWC with BeGin in Appendix \ref{['sec:app:ewc']}.
  • Figure 5: Change of Average Performance (AP) during continual learning. NC: Node Classification. LC: Link Classification. LP: Link Prediction. GC: Graph Classification. Note that the Joint model, which is trained using the entire dataset together, sometimes (e.g., in (d)) suffers from instability, especially when training samples for some classes are very limited in Time-IL. The full results for all considered scenarios are available in Appendix \ref{['sec:app:add_results']}.
  • ...and 7 more figures