Diverse Feature Learning by Self-distillation and Reset
Sejik Park
TL;DR
Diverse Feature Learning (DFL) addresses the dual challenge of forgetting learned features and failing to acquire new ones by integrating two complementary strategies: self-distillation-based feature preservation and reset-driven exploration of new feature spaces. The method treats high-quality weights encountered during training as teachers and selects them via a meaningfulness criterion, while periodically reinitializing the student head to encourage learning new features. Empirical results on CIFAR-10 and CIFAR-100 across several lightweight architectures show that combining these components yields synergistic improvements, with ablations highlighting the importance of the number of teachers, cycle length, and the depth of the student head. The work suggests that coupling ensemble-inspired preservation with weight-space exploration can enhance feature diversity and model performance, albeit with caveats around the reliability of the meaningfulness measure and hyperparameter sensitivity.
Abstract
Our paper addresses the problem of models struggling to learn diverse features, due to either forgetting previously learned features or failing to learn new ones. To overcome this problem, we introduce Diverse Feature Learning (DFL), a method that combines an important feature preservation algorithm with a new feature learning algorithm. Specifically, for preserving important features, we utilize self-distillation in ensemble models by selecting the meaningful model weights observed during training. For learning new features, we employ reset that involves periodically re-initializing part of the model. As a result, through experiments with various models on the image classification, we have identified the potential for synergistic effects between self-distillation and reset.
