Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu, Lu Wang, Lisheng Xu, Jun Cheng
TL;DR
This paper tackles the challenge of transferring fine-grained inter-class knowledge in knowledge distillation by introducing Local Dense Relational Logit Distillation (LDRLD). LDRLD recursively decouples and recombines logit information to create dense, informative inter-class relationships, and augments this with Adaptive Decay Weighting (ADW) using Inverse Rank Weighting (IRW) and Exponential Rank Decay (ERD) to emphasize closely related categories. It also distills remaining non-target knowledge to ensure completeness, combining these components into a total loss that improves student performance across CIFAR-100, Tiny-ImageNet, and ImageNet-1K, with demonstrations in object detection and fine-grained tasks. The approach yields consistent improvements over state-of-the-art logit-based KD methods, provides robust generalization, and is supported by visualization analyses showing closer teacher-student logit alignment and more targeted attention.
Abstract
State-of-the-art logit distillation methods exhibit versatility, simplicity, and efficiency. Despite the advances, existing studies have yet to delve thoroughly into fine-grained relationships within logit knowledge. In this paper, we propose Local Dense Relational Logit Distillation (LDRLD), a novel method that captures inter-class relationships through recursively decoupling and recombining logit information, thereby providing more detailed and clearer insights for student learning. To further optimize the performance, we introduce an Adaptive Decay Weight (ADW) strategy, which can dynamically adjust the weights for critical category pairs using Inverse Rank Weighting (IRW) and Exponential Rank Decay (ERD). Specifically, IRW assigns weights inversely proportional to the rank differences between pairs, while ERD adaptively controls weight decay based on total ranking scores of category pairs. Furthermore, after the recursive decoupling, we distill the remaining non-target knowledge to ensure knowledge completeness and enhance performance. Ultimately, our method improves the student's performance by transferring fine-grained knowledge and emphasizing the most critical relationships. Extensive experiments on datasets such as CIFAR-100, ImageNet-1K, and Tiny-ImageNet demonstrate that our method compares favorably with state-of-the-art logit-based distillation approaches. The code will be made publicly available.
