Knowledge Distillation Detection for Open-weights Models
Qin Shi, Amber Yijia Zheng, Qifan Song, Raymond A. Yeh
TL;DR
The paper introduces knowledge distillation detection under open-weight constraints, where only the student weights and teacher API are available. It proposes a model-agnostic, three-stage framework that synthesizes inputs without training data, computes alignment scores between the student and candidate teachers, and selects the most likely teacher via score maximization. The approach is instantiated for image classification and text-to-image generation with task-specific input construction and scoring schemes, yielding substantial improvements over baselines on CIFAR-10, ImageNet, and several diffusion-based targets. The results demonstrate strong generalizability and practical potential for model provenance and IP protection, with future work extending to diffusion and large language models.
Abstract
We propose the task of knowledge distillation detection, which aims to determine whether a student model has been distilled from a given teacher, under a practical setting where only the student's weights and the teacher's API are available. This problem is motivated by growing concerns about model provenance and unauthorized replication through distillation. To address this task, we introduce a model-agnostic framework that combines data-free input synthesis and statistical score computation for detecting distillation. Our approach is applicable to both classification and generative models. Experiments on diverse architectures for image classification and text-to-image generation show that our method improves detection accuracy over the strongest baselines by 59.6% on CIFAR-10, 71.2% on ImageNet, and 20.0% for text-to-image generation. The code is available at https://github.com/shqii1j/distillation_detection.
