Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
TL;DR
This work tackles catastrophic forgetting and zero-shot degradation when fine-tuning large vision-language models across sequential tasks. It introduces Selective Dual-Teacher Knowledge Transfer, which employs both the most recently fine-tuned model $g_{k-1}$ and the original pre-trained model $g_0$ as dual teachers, selecting the appropriate teacher for each reference image via a dual-teacher discrepancy and a sigmoid-based selection score. The learned objective combines standard cross-entropy with a weighted dual KD term, enabling continual learning while preserving zero-shot capabilities without accessing previous task data. Empirical results on eight fine-grained datasets and MTIL/MCIL benchmarks show substantial improvements over state-of-the-art continual learning methods, with reduced forgetting and robust open-vocabulary transfer, albeit with limitations tied to the reference data distribution.
Abstract
Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, adapting pre-trained VLMs to a sequence of downstream tasks often leads to the forgetting of previously learned knowledge and a reduction in zero-shot classification performance. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer framework that leverages the most recent fine-tuned and the original pre-trained VLMs as dual teachers to preserve the previously learned knowledge and zero-shot capabilities, respectively. With only access to an unlabeled reference dataset, our proposed framework performs a selective knowledge distillation mechanism by measuring the feature discrepancy from the dual-teacher VLMs. Consequently, our selective dual-teacher knowledge distillation mitigates catastrophic forgetting of previously learned knowledge while preserving the zero-shot capabilities of pre-trained VLMs. Extensive experiments on benchmark datasets demonstrate that our framework is favorable against state-of-the-art continual learning approaches for preventing catastrophic forgetting and zero-shot degradation. Project page: https://chuyu.org/research/snd
