UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations

Fengming Yu; Haiwei Pan; Kejia Zhang; Jian Guan; Haiying Jiang

UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations

Fengming Yu, Haiwei Pan, Kejia Zhang, Jian Guan, Haiying Jiang

TL;DR

UHKD tackles heterogeneous knowledge distillation by transferring intermediate knowledge through frequency-domain representations, bridging semantic gaps across diverse architectures. It introduces a dual-module pipeline: FTM converts teacher features into compact, refined frequency-domain representations, while FAM learns to align student features into the same spectral space. The framework is trained with a joint objective that fuses frequency-domain MSE with logits-based KL and standard cross-entropy losses, yielding consistent gains over state-of-the-art heterogeneous KD methods on CIFAR-100 and ImageNet-1K, and maintaining robustness in homogeneous settings. Empirical results, ablations, and visual analyses confirm that frequency-domain representations effectively capture global semantics and mitigate architectural discrepancies, enabling scalable and efficient cross-architecture knowledge transfer.

Abstract

Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing computational and storage costs while maintaining competitive accuracy. However, most existing KD methods are tailored for homogeneous models and perform poorly in heterogeneous settings, particularly when intermediate features are involved. Semantic discrepancies across architectures hinder effective use of intermediate representations from the teacher model, while prior heterogeneous KD studies mainly focus on the logits space, underutilizing rich semantic information in intermediate layers. To address this, Unified Heterogeneous Knowledge Distillation (UHKD) is proposed, a framework that leverages intermediate features in the frequency domain for cross-architecture transfer. Frequency-domain representations are leveraged to capture global semantic knowledge and mitigate representational discrepancies between heterogeneous teacher-student pairs. Specifically, a Feature Transformation Module (FTM) generates compact frequency-domain representations of teacher features, while a learnable Feature Alignment Module (FAM) projects student features and aligns them via multi-level matching. Training is guided by a joint objective combining mean squared error on intermediate features with Kullback-Leibler divergence on logits. Extensive experiments on CIFAR-100 and ImageNet-1K demonstrate the effectiveness of the proposed approach, achieving maximum gains of 5.59% and 0.83% over the latest heterogeneous distillation method on the two datasets, respectively. Code will be released soon.

UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations

TL;DR

Abstract

UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)