CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

Jungsoo Lee; Debasmit Das; Munawar Hayat; Sungha Choi; Kyuwoong Hwang; Fatih Porikli

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

Jungsoo Lee, Debasmit Das, Munawar Hayat, Sungha Choi, Kyuwoong Hwang, Fatih Porikli

TL;DR

This work addresses the gap between large vision foundation models (LVFMs) and edge models by presenting CustomKD, a two-stage knowledge distillation framework that first customizes LVFM features to the student via a shared head, and then distills knowledge from both the original and customized teacher features. By alternating feature customization and KD, and by employing both task-general and task-specific supervision, CustomKD overcomes the large discrepancy between teacher and student architectures and backbones without altering inference. Empirically, it achieves state-of-the-art or competitive results on unsupervised domain adaptation and semi-supervised learning, across diverse datasets and teacher backbones, while preserving edge-model efficiency. The method promises practical impact for deploying high-performing edge models in real-world settings by leveraging unlabeled data and LVFMs without additional inference costs.

Abstract

We propose a novel knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models (e.g., MobileNetV3). Despite recent advancements in LVFMs, such as DINOv2 and CLIP, their potential in knowledge distillation for enhancing edge models remains underexplored. While knowledge distillation is a promising approach for improving the performance of edge models, the discrepancy in model capacities and heterogeneous architectures between LVFMs and edge models poses a significant challenge. Our observation indicates that although utilizing larger backbones (e.g., ViT-S to ViT-L) in teacher models improves their downstream task performances, the knowledge distillation from the large teacher models fails to bring as much performance gain for student models as for teacher models due to the large model discrepancy. Our simple yet effective CustomKD customizes the well-generalized features inherent in LVFMs to a given student model in order to reduce model discrepancies. Specifically, beyond providing well-generalized original knowledge from teachers, CustomKD aligns the features of teachers to those of students, making it easy for students to understand and overcome the large model discrepancy overall. CustomKD significantly improves the performances of edge models in scenarios with unlabeled data such as unsupervised domain adaptation (e.g., OfficeHome and DomainNet) and semi-supervised learning (e.g., CIFAR-100 with 400 labeled samples and ImageNet with 1% labeled samples), achieving the new state-of-the-art performances.

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

TL;DR

Abstract

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)