Table of Contents
Fetching ...

Efficient and Robust Continual Graph Learning for Graph Classification in Biology

Ding Zhang, Jane Downer, Can Chen, Ren Wang

TL;DR

The paper tackles graph-level continual learning for biology, where models must learn from evolving tasks without forgetting earlier knowledge. It introduces PSCGL, a framework that combines memory replay with perturbed graph sampling, motif-based sparsification, and consistency training to improve efficiency, robustness, and scalability. PSCGL not only sustains knowledge across tasks but also defends against graph backdoor attacks, achieving superior average performance and lower forgetting on Enzymes and Aromaticity, while reducing storage and computation through sparsification. This approach has practical implications for reliable biological graph analysis in dynamic settings, including drug discovery and enzyme function prediction, where data evolve and security concerns are paramount.

Abstract

Graph classification is essential for understanding complex biological systems, where molecular structures and interactions are naturally represented as graphs. Traditional graph neural networks (GNNs) perform well on static tasks but struggle in dynamic settings due to catastrophic forgetting. We present Perturbed and Sparsified Continual Graph Learning (PSCGL), a robust and efficient continual graph learning framework for graph data classification, specifically targeting biological datasets. We introduce a perturbed sampling strategy to identify critical data points that contribute to model learning and a motif-based graph sparsification technique to reduce storage needs while maintaining performance. Additionally, our PSCGL framework inherently defends against graph backdoor attacks, which is crucial for applications in sensitive biological contexts. Extensive experiments on biological datasets demonstrate that PSCGL not only retains knowledge across tasks but also enhances the efficiency and robustness of graph classification models in biology.

Efficient and Robust Continual Graph Learning for Graph Classification in Biology

TL;DR

The paper tackles graph-level continual learning for biology, where models must learn from evolving tasks without forgetting earlier knowledge. It introduces PSCGL, a framework that combines memory replay with perturbed graph sampling, motif-based sparsification, and consistency training to improve efficiency, robustness, and scalability. PSCGL not only sustains knowledge across tasks but also defends against graph backdoor attacks, achieving superior average performance and lower forgetting on Enzymes and Aromaticity, while reducing storage and computation through sparsification. This approach has practical implications for reliable biological graph analysis in dynamic settings, including drug discovery and enzyme function prediction, where data evolve and security concerns are paramount.

Abstract

Graph classification is essential for understanding complex biological systems, where molecular structures and interactions are naturally represented as graphs. Traditional graph neural networks (GNNs) perform well on static tasks but struggle in dynamic settings due to catastrophic forgetting. We present Perturbed and Sparsified Continual Graph Learning (PSCGL), a robust and efficient continual graph learning framework for graph data classification, specifically targeting biological datasets. We introduce a perturbed sampling strategy to identify critical data points that contribute to model learning and a motif-based graph sparsification technique to reduce storage needs while maintaining performance. Additionally, our PSCGL framework inherently defends against graph backdoor attacks, which is crucial for applications in sensitive biological contexts. Extensive experiments on biological datasets demonstrate that PSCGL not only retains knowledge across tasks but also enhances the efficiency and robustness of graph classification models in biology.

Paper Structure

This paper contains 25 sections, 5 equations, 1 figure, 3 tables, 2 algorithms.

Figures (1)

  • Figure 1: Overview of the proposed Perturbed and Sparsified Continual Graph Learning (PSCGL) framework at tasks $t-1$ and $t$. The framework includes a memory buffer (illustrated by blue rectangles) that stores representative graph data from previous tasks for retraining purposes. During task $t-1$, the model $\text{GNN}_{t-1}$ (shown as the upper green rectangles) is trained using a consistency training scheme on the task $t-1$ data ${\mathcal{D}}_{t-1}$, representative data from the buffer ${\mathcal{B}}$, and their augmented versions. After training, graphs from ${\mathcal{D}}_{t-1}$ are sampled using perturbed graph sampling with the trained model $\text{GNN}_{t-1}$. These sampled graphs then undergo the proposed graph sparsification process for size reduction. Finally, the sparsified graphs ${\mathcal{B}}_{t-1}$ are subsequently stored in the memory buffer. The training process to update $\text{GNN}_{t}$ (depicted as the lower green rectangles) at task $t$ follows a similar procedure to that of $\text{GNN}_{t-1}$.