Table of Contents
Fetching ...

Knowledge Distillation Detection for Open-weights Models

Qin Shi, Amber Yijia Zheng, Qifan Song, Raymond A. Yeh

TL;DR

The paper introduces knowledge distillation detection under open-weight constraints, where only the student weights and teacher API are available. It proposes a model-agnostic, three-stage framework that synthesizes inputs without training data, computes alignment scores between the student and candidate teachers, and selects the most likely teacher via score maximization. The approach is instantiated for image classification and text-to-image generation with task-specific input construction and scoring schemes, yielding substantial improvements over baselines on CIFAR-10, ImageNet, and several diffusion-based targets. The results demonstrate strong generalizability and practical potential for model provenance and IP protection, with future work extending to diffusion and large language models.

Abstract

We propose the task of knowledge distillation detection, which aims to determine whether a student model has been distilled from a given teacher, under a practical setting where only the student's weights and the teacher's API are available. This problem is motivated by growing concerns about model provenance and unauthorized replication through distillation. To address this task, we introduce a model-agnostic framework that combines data-free input synthesis and statistical score computation for detecting distillation. Our approach is applicable to both classification and generative models. Experiments on diverse architectures for image classification and text-to-image generation show that our method improves detection accuracy over the strongest baselines by 59.6% on CIFAR-10, 71.2% on ImageNet, and 20.0% for text-to-image generation. The code is available at https://github.com/shqii1j/distillation_detection.

Knowledge Distillation Detection for Open-weights Models

TL;DR

The paper introduces knowledge distillation detection under open-weight constraints, where only the student weights and teacher API are available. It proposes a model-agnostic, three-stage framework that synthesizes inputs without training data, computes alignment scores between the student and candidate teachers, and selects the most likely teacher via score maximization. The approach is instantiated for image classification and text-to-image generation with task-specific input construction and scoring schemes, yielding substantial improvements over baselines on CIFAR-10, ImageNet, and several diffusion-based targets. The results demonstrate strong generalizability and practical potential for model provenance and IP protection, with future work extending to diffusion and large language models.

Abstract

We propose the task of knowledge distillation detection, which aims to determine whether a student model has been distilled from a given teacher, under a practical setting where only the student's weights and the teacher's API are available. This problem is motivated by growing concerns about model provenance and unauthorized replication through distillation. To address this task, we introduce a model-agnostic framework that combines data-free input synthesis and statistical score computation for detecting distillation. Our approach is applicable to both classification and generative models. Experiments on diverse architectures for image classification and text-to-image generation show that our method improves detection accuracy over the strongest baselines by 59.6% on CIFAR-10, 71.2% on ImageNet, and 20.0% for text-to-image generation. The code is available at https://github.com/shqii1j/distillation_detection.

Paper Structure

This paper contains 16 sections, 16 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Knowledge distillation detection pipeline. The framework consists of three stages: input construction via a generator $G$, score computation between the student and candidate teachers, and prediction by selecting the teacher with the highest aggregated score.
  • Figure 2: Accuracy of knowledge distillation detection across different distillation methods. CIFAR-10 (left) and ImageNet (right).
  • Figure 3: KL divergence (left) and 1 - ACS (right) between the student and teacher (or independent model) outputs as a function of the distillation weight $\lambda$. The total loss used to train the student is defined in Eq. \ref{['eq:kd_loss']}, combining a hard cross-entropy term and a soft KL divergence term.