Simple Unsupervised Knowledge Distillation With Space Similarity
Aditya Singh, Haohan Wang
TL;DR
This work tackles the challenge of transferring knowledge from a self-supervised teacher to a smaller student without labels. It introduces CoSS, a two-part objective that combines feature-level cosine alignment with a novel space similarity term to align the teacher's embedding manifold with the student's, preserving spatial structure despite $L_2$ normalization. The method uses an offline k-nearest neighbor pre-processing step to capture local manifold structure and a simple online distillation loss $\\mathcal{L}_{CoSS} = \mathcal{L}_{co} + \lambda \mathcal{L}_{ss}$, achieving state-of-the-art or competitive results across ImageNet classification, transfer learning, dense prediction, retrieval, and robustness benchmarks. The results demonstrate the practicality of manifold-aware UKD for compact models without requiring feature queues or heavy augmentations, with potential applicability to other domains.
Abstract
As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller architectures. One direction to mitigate this shortcoming while simultaneously training a smaller network without labels is to adopt unsupervised knowledge distillation (UKD). Existing UKD approaches handcraft preservation worthy inter/intra sample relationships between the teacher and its student. However, this may overlook/ignore other key relationships present in the mapping of a teacher. In this paper, instead of heuristically constructing preservation worthy relationships between samples, we directly motivate the student to model the teacher's embedding manifold. If the mapped manifold is similar, all inter/intra sample relationships are indirectly conserved. We first demonstrate that prior methods cannot preserve teacher's latent manifold due to their sole reliance on $L_2$ normalised embedding features. Subsequently, we propose a simple objective to capture the lost information due to normalisation. Our proposed loss component, termed \textbf{space similarity}, motivates each dimension of a student's feature space to be similar to the corresponding dimension of its teacher. We perform extensive experiments demonstrating strong performance of our proposed approach on various benchmarks.
