Delving Deep into Semantic Relation Distillation
Zhaoyi Yan, Kangjun Liu, Qixiang Ye
TL;DR
This work addresses the limitations of instance-level knowledge distillation by introducing Semantics-based Relation Knowledge Distillation (SeRKD), which leverages semantic components derived from superpixels to transfer relational knowledge. SeRKD constructs semantic superpixel tokens and applies relation-based distillation on these tokens, naturally aligning with Vision Transformer representations and adaptable to CNNs through tokenization. Empirical results on ImageNet-1k and downstream datasets demonstrate that SeRKD outperforms traditional KD and RKD baselines, with notable gains in transfer learning and robust ablations highlighting the importance of semantic clustering, grid size, and iteration settings. Overall, SeRKD provides a semantically grounded, scalable approach to model compression and knowledge transfer that enhances generalization in modern architectures.
Abstract
Knowledge distillation has become a cornerstone technique in deep learning, facilitating the transfer of knowledge from complex models to lightweight counterparts. Traditional distillation approaches focus on transferring knowledge at the instance level, but fail to capture nuanced semantic relationships within the data. In response, this paper introduces a novel methodology, Semantics-based Relation Knowledge Distillation (SeRKD), which reimagines knowledge distillation through a semantics-relation lens among each sample. By leveraging semantic components, \ie, superpixels, SeRKD enables a more comprehensive and context-aware transfer of knowledge, which skillfully integrates superpixel-based semantic extraction with relation-based knowledge distillation for a sophisticated model compression and distillation. Particularly, the proposed method is naturally relevant in the domain of Vision Transformers (ViTs), where visual tokens serve as fundamental units of representation. Experimental evaluations on benchmark datasets demonstrate the superiority of SeRKD over existing methods, underscoring its efficacy in enhancing model performance and generalization capabilities.
