Table of Contents
Fetching ...

HERO: Heterogeneous Continual Graph Learning via Meta-Knowledge Distillation

Guiquan Sun, Xikun Zhang, Jingchao Ni, Dongjin Song

Abstract

Heterogeneous graph neural networks have seen rapid progress in web applications such as social networks, knowledge graphs, and recommendation systems, driven by the inherent heterogeneity of web data. However, existing methods typically assume static graphs, while real-world graphs are continuously evolving. This dynamic nature requires models to adapt to new data while preserving existing knowledge. To this end, this work introduces HERO (HEterogeneous continual gRaph learning via meta-knOwledge distillation), a unified framework for continual learning on heterogeneous graphs. HERO employs meta-adaptation, a gradient-based meta-learning strategy that provides directional guidance for rapid adaptation to new tasks with limited samples. To enable efficient and effective knowledge reuse, we propose DiSCo (Diversity Sampling with semantic Consistency), a heterogeneity-aware sampling method that maximizes target node diversity and expands subgraphs along metapaths, retaining critical semantic and structural information with minimal overhead. Furthermore, HERO incorporates heterogeneity-aware knowledge distillation, which aligns knowledge at both the node and semantic levels to balance adaptation and retention across tasks. Extensive experiments on four web-related heterogeneous graph benchmarks demonstrate that HERO substantially mitigates catastrophic forgetting while achieving efficient and consistent knowledge reuse in dynamic web environments.

HERO: Heterogeneous Continual Graph Learning via Meta-Knowledge Distillation

Abstract

Heterogeneous graph neural networks have seen rapid progress in web applications such as social networks, knowledge graphs, and recommendation systems, driven by the inherent heterogeneity of web data. However, existing methods typically assume static graphs, while real-world graphs are continuously evolving. This dynamic nature requires models to adapt to new data while preserving existing knowledge. To this end, this work introduces HERO (HEterogeneous continual gRaph learning via meta-knOwledge distillation), a unified framework for continual learning on heterogeneous graphs. HERO employs meta-adaptation, a gradient-based meta-learning strategy that provides directional guidance for rapid adaptation to new tasks with limited samples. To enable efficient and effective knowledge reuse, we propose DiSCo (Diversity Sampling with semantic Consistency), a heterogeneity-aware sampling method that maximizes target node diversity and expands subgraphs along metapaths, retaining critical semantic and structural information with minimal overhead. Furthermore, HERO incorporates heterogeneity-aware knowledge distillation, which aligns knowledge at both the node and semantic levels to balance adaptation and retention across tasks. Extensive experiments on four web-related heterogeneous graph benchmarks demonstrate that HERO substantially mitigates catastrophic forgetting while achieving efficient and consistent knowledge reuse in dynamic web environments.

Paper Structure

This paper contains 27 sections, 18 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: An example of Evolving Heterogeneous Graph. (a) Node types include Actor, Movie, and Director. (b) As the graph evolves, new domains emerge (e.g., movies of different genres such as Comedy/Romance and Thriller/Action).
  • Figure 2: The overall framework of HERO. (a) Diversity Sampling with semantic Consistency: Construct task-specific subgraphs by selecting diverse target-type nodes and expanding to related node types via relation-aware importance. (b) Heterogeneity-aware Knowledge Distillation: Align previous and current tasks via logit-level and semantic-level distillation. (c) Task Training: Use meta-learning for fast adaptation, jointly optimizing task loss, replay loss, and distillation loss.
  • Figure 3: Performance comparison on the Yelp dataset under two settings using HAN as the backbone model.
  • Figure 4: Visualization: Accuracy matrices on the Yelp dataset under 2-way setting.
  • Figure 5: Sensitivity analysis of HERO with respect to key hyperparameters. The shaded areas represent variances. From left to right: (1) distillation loss weight, (2) experience replay sampling budget, and (3) shot number used in meta-learning. Due to space constraints, we only report the Average Performance (AP) here.
  • ...and 3 more figures