Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

Yan Fan; Yu Wang; Pengfei Zhu; Qinghua Hu

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

Yan Fan, Yu Wang, Pengfei Zhu, Qinghua Hu

TL;DR

This work tackles semi-supervised continual learning (SSCL), where limited labeled data and uncertain unlabeled distributions cause unstable training and forgetting. It introduces Dynamic Sub-graph Distillation (DSGD), a graph-based framework that captures high-order structural information via dynamic topology graphs and $K$-order Personalized PageRank distillation vectors to stabilize learning on unlabeled data. The method forms two graphs for current and replayed data, defines a sub-graph distillation loss $\mathcal{L}_{SGD}$ to preserve local structure, and ensembles current and past predictions with a logistic-like weighting $\alpha$ for robust semi-supervision. Experiments on CIFAR-10/100 and ImageNet-100 across varying label ratios show that DSGD improves average and last incremental accuracy while reducing memory requirements, outperforming several SSCL baselines. Overall, DSGD provides a scalable, structure-aware approach that mitigates distribution bias and catastrophic forgetting in SSCL, with strong empirical gains and broad applicability to different continual learning settings.

Abstract

Continual learning (CL) has shown promising results and comparable performance to learning at once in a fully supervised manner. However, CL strategies typically require a large number of labeled samples, making their real-life deployment challenging. In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories. We provide a comprehensive analysis of SSCL and demonstrate that unreliable distributions of unlabeled data lead to unstable training and refinement of the progressing stages. This problem severely impacts the performance of SSCL. To address the limitations, we propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning, which leverages both semantic and structural information to achieve more stable knowledge distillation on unlabeled data and exhibit robustness against distribution bias. Firstly, we formalize a general model of structural distillation and design a dynamic graph construction for the continual learning progress. Next, we define a structure distillation vector and design a dynamic sub-graph distillation algorithm, which enables end-to-end training and adaptability to scale up tasks. The entire proposed method is adaptable to various CL methods and supervision settings. Finally, experiments conducted on three datasets CIFAR10, CIFAR100, and ImageNet-100, with varying supervision ratios, demonstrate the effectiveness of our proposed approach in mitigating the catastrophic forgetting problem in semi-supervised continual learning scenarios.

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

TL;DR

-order Personalized PageRank distillation vectors to stabilize learning on unlabeled data. The method forms two graphs for current and replayed data, defines a sub-graph distillation loss

to preserve local structure, and ensembles current and past predictions with a logistic-like weighting

for robust semi-supervision. Experiments on CIFAR-10/100 and ImageNet-100 across varying label ratios show that DSGD improves average and last incremental accuracy while reducing memory requirements, outperforming several SSCL baselines. Overall, DSGD provides a scalable, structure-aware approach that mitigates distribution bias and catastrophic forgetting in SSCL, with strong empirical gains and broad applicability to different continual learning settings.

Abstract

Paper Structure (15 sections, 7 equations, 5 figures, 4 tables)

This paper contains 15 sections, 7 equations, 5 figures, 4 tables.

Introduction
Related Work
Continual Learning
Semi-supervised Learning
Semi-supervised Continual Learning
Methods
Problem Formulation and Baseline
A Systematic Study of SSCL
Dynamic Sub-graph Distillation
Experiments
Experiment Setups
Quantitative Results
Ablation Study and Parameter Analysis
Conclusion
Acknowledgments

Figures (5)

Figure 1: Challenge analysis of semi-supervised continual learning. (a) Illustration of semi-supervised continual learning. (b) The accuracy tendency of the testing set and unlabeled training set shows a significant positive correlation.
Figure 2: Baselines that combine CL and SSL methods to tackle SSCL. (a) DistillLabel: distillation on labeled data. DistillALL: apply distillation loss on the entire memory buffer. GTUnlabel: correct the pseudo-labels of unlabeled data in the memory buffer to ground truth. (b) New(Old)_pseudo: apply the predictions of the current(previous) network as pseudo labels $\bm{p}^{\mathcal{A}}$. (c-d) The distribution bias of fully and semi-supervised settings.
Figure 3: The framework of the proposed method Dynamic Sub-graph Distillation. Given the merged batch of current and replayed samples, we first generate weak and strong augmentations for each image, and the semantic representations produced by the current network $f^t_\theta$. We then employ the outputs of the weak version to construct the new graph, and the corresponding old graph based on replayed samples can also be built. Through the probability matrix $\bm{P}^R$ and $\bm{P}^N$, the distillation vector, which captures the local structure information, can be used as the component of our sub-graph distillation loss. With the guides of distillation loss, the current network $f_\theta^t$ will be trained with the invariance of sub-graph structure associated with each example.
Figure 4: Accuracy of all learned tasks, all old tasks, and the new task on the CIFAR100-20 benchmark. (a) The accuracy of all learned tasks on different strategies. DistillLabeled, DistillAll represent that using representation distillation on labeled and all samples of memory, respectively; DistillAll_Replypse means applying the previous prediction as pseudo labels based on DistillAll. (b) Accuracy of new and old tasks on our method DSGD and iCaRL&Fix.
Figure 5: Robustness Testing. (a) Performance under different values of hyperparameter $\gamma$. (b) Performance with different values of K. Both of them are evaluated on CIFAR100.

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

TL;DR

Abstract

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)