SEED: Self-supervised Distillation For Visual Representation
Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
TL;DR
SEED introduces a self-supervised distillation framework that transfers knowledge from a large, SSL-pretrained teacher to a smaller student without using labels. By matching the teacher’s instance-similarity distribution over a dynamically updated queue, SEED enables small architectures to achieve substantially higher ImageNet performance and better transferability than traditional contrastive SSL. The approach is robust across teacher choices and distillation variants, and it improves linear, semi-supervised, and downstream task performance including detection and segmentation. This work highlights a practical path to high-quality visual representations for resource-constrained models, with broad implications for deploying SSL in real-world, small-footprint settings.
Abstract
This paper is concerned with self-supervised learning for small models. The problem is motivated by our empirical studies that while the widely used contrastive self-supervised learning method has shown great progress on large model training, it does not work well for small models. To address this problem, we propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), where we leverage a larger network (as Teacher) to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. Instead of directly learning from unlabeled data, we train a student encoder to mimic the similarity score distribution inferred by a teacher over a set of instances. We show that SEED dramatically boosts the performance of small networks on downstream tasks. Compared with self-supervised baselines, SEED improves the top-1 accuracy from 42.2% to 67.6% on EfficientNet-B0 and from 36.3% to 68.2% on MobileNet-v3-Large on the ImageNet-1k dataset.
