Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

Jinyin Chen; Xiaoming Zhao; Haibin Zheng; Xiao Li; Sheng Xiang; Haifeng Guo

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

Jinyin Chen, Xiaoming Zhao, Haibin Zheng, Xiao Li, Sheng Xiang, Haifeng Guo

TL;DR

RobustKD addresses the risk of backdoors transferring from poisoned teacher models to student models during knowledge distillation. It introduces a variance-based feature detoxification mechanism that optimizes a depoisoning mask to minimize backdoor signals while preserving distillation performance, formalized with losses $L_{ce}$ and $L_f$. Across multiple datasets and architectures, RobustKD achieves substantial backdoor mitigation (average ASR reduction ~85%) with minimal main-task degradation (~3–4%), and demonstrates robustness against adaptive attacks. This approach provides a practical framework for secure model compression on edge devices, balancing security and performance in knowledge distillation workflows.

Abstract

Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been empirically noticed that the backdoor in the teacher model will be transferred to the student model during the process of KD. Although numerous KD methods have been proposed, most of them focus on the distillation of a high-performing student model without robustness consideration. Besides, some research adopts KD techniques as effective backdoor mitigation tools, but they fail to perform model compression at the same time. Consequently, it is still an open problem to well achieve two objectives of robust KD, i.e., student model's performance and backdoor mitigation. To address these issues, we propose RobustKD, a robust knowledge distillation that compresses the model while mitigating backdoor based on feature variance. Specifically, RobustKD distinguishes the previous works in three key aspects: (1) effectiveness: by distilling the feature map of the teacher model after detoxification, the main task performance of the student model is comparable to that of the teacher model; (2) robustness: by reducing the characteristic variance between the teacher model and the student model, it mitigates the backdoor of the student model under backdoored teacher model scenario; (3) generic: RobustKD still has good performance in the face of multiple data models (e.g., WRN 28-4, Pyramid-200) and diverse DNNs (e.g., ResNet50, MobileNet).

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

TL;DR

and

. Across multiple datasets and architectures, RobustKD achieves substantial backdoor mitigation (average ASR reduction ~85%) with minimal main-task degradation (~3–4%), and demonstrates robustness against adaptive attacks. This approach provides a practical framework for secure model compression on edge devices, balancing security and performance in knowledge distillation workflows.

Abstract

Paper Structure (33 sections, 15 equations, 7 figures, 7 tables)

This paper contains 33 sections, 15 equations, 7 figures, 7 tables.

Introduction
Related Work
Knowledge Distillation
Backdoor Attack
Backdoor Defense
Adversarial Distillation
Preliminary
Knowledge Distillation
Threat Model
Formalization of Robust Knowledge Distillation
Methodology
Overview
Feature Detoxification
Feature initialization
Detoxification feature generation
...and 18 more sections

Figures (7)

Figure 1: An illustration of threats suffered by DNNs during compression. The WRN28-4 model was poisoned with LBA, and we uploaded it to Hugging Face and downloaded the poisoned model using a separate account. After implementing FKD, the distilled student model still had a backdoor.
Figure 2: Variance of examples of different attack methods
Figure 3: The overview of RobustKD. RobustKD achieves robust feature distillation by performing two key steps on the feature distillation process: (I) feature detoxification and (II) feature distillation.
Figure 4: Effect of $m$ threshold on RobustKD
Figure 5: The performance was analyzed under different detoxification methods, the WRN 28-4 teacher model was attacked using CKD on CIFAR-100, and the ACC and ASR of the teacher model were 78.46% and 94.14%.
...and 2 more figures

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

TL;DR

Abstract

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

Authors

TL;DR

Abstract

Table of Contents

Figures (7)