InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation
Jinbin Huang, Wenbin He, Liang Gou, Liu Ren, Chris Bryan
TL;DR
InFiConD tackles the challenge of deploying large pretrained vision models in resource-constrained environments by enabling interpretable, concept-based knowledge distillation (KD) and no-code fine-tuning. The framework builds a model-agnostic KD pipeline that maps images into 584-dimensional, text-labeled concept vectors drawn from a CLIP-S4 based concept corpus, and trains an ensemble of $k=20$ linear student models using BCE loss with $L1$ regularization ($\lambda=10^{-4}$). A no-code concept tuning workflow allows users to uptune or downtune specific concepts by applying bounded constraints $x$-scaled around original weights, with a provenance log of tuning actions and performance impacts. Validation via usage scenarios and a user study demonstrates that the human-in-the-loop, visualization-driven approach supports effective knowledge transfer, rapid fine-tuning, and broader accessibility for domain-specific applications, without extensive coding requirements.
Abstract
The emergence of large-scale pre-trained models has heightened their application in various downstream tasks, yet deployment is a challenge in environments with limited computational resources. Knowledge distillation has emerged as a solution in such scenarios, whereby knowledge from large teacher models is transferred into smaller student' models, but this is a non-trivial process that traditionally requires technical expertise in AI/ML. To address these challenges, this paper presents InFiConD, a novel framework that leverages visual concepts to implement the knowledge distillation process and enable subsequent no-code fine-tuning of student models. We develop a novel knowledge distillation pipeline based on extracting text-aligned visual concepts from a concept corpus using multimodal models, and construct highly interpretable linear student models based on visual concepts that mimic a teacher model in a response-based manner. InFiConD's interface allows users to interactively fine-tune the student model by manipulating concept influences directly in the user interface. We validate InFiConD via a robust usage scenario and user study. Our findings indicate that InFiConD's human-in-the-loop and visualization-driven approach enables users to effectively create and analyze student models, understand how knowledge is transferred, and efficiently perform fine-tuning operations. We discuss how this work highlights the potential of interactive and visual methods in making knowledge distillation and subsequent no-code fine-tuning more accessible and adaptable to a wider range of users with domain-specific demands.
