Data-free Knowledge Distillation for Fine-grained Visual Categorization
Renrong Shao, Wei Zhang, Jianhua Yin, Jun Wang
TL;DR
The paper tackles the challenge of fine-grained visual categorization under data-free knowledge distillation by introducing DFKD-FGVC, an adversarial framework that combines a spatially attentive generator, mixed high-order attention distillation, and semantic feature contrast learning. By synthesizing discriminative, fine-grained images and aligning high-level semantic representations between teacher and student in hyperspace, the method achieves state-of-the-art results on FGVC benchmarks without real data. Key contributions include the spatial attention generator, MHAD to model part interactions, and SFCL to maximize semantic separability, all validated through extensive experiments, ablations, and visual analyses. The approach enables privacy-preserving model compression and deployment in data-restricted settings while maintaining robust fine-grained performance.
Abstract
Data-free knowledge distillation (DFKD) is a promising approach for addressing issues related to model compression, security privacy, and transmission restrictions. Although the existing methods exploiting DFKD have achieved inspiring achievements in coarse-grained classification, in practical applications involving fine-grained classification tasks that require more detailed distinctions between similar categories, sub-optimal results are obtained. To address this issue, we propose an approach called DFKD-FGVC that extends DFKD to fine-grained visual categorization~(FGVC) tasks. Our approach utilizes an adversarial distillation framework with attention generator, mixed high-order attention distillation, and semantic feature contrast learning. Specifically, we introduce a spatial-wise attention mechanism to the generator to synthesize fine-grained images with more details of discriminative parts. We also utilize the mixed high-order attention mechanism to capture complex interactions among parts and the subtle differences among discriminative features of the fine-grained categories, paying attention to both local features and semantic context relationships. Moreover, we leverage the teacher and student models of the distillation framework to contrast high-level semantic feature maps in the hyperspace, comparing variances of different categories. We evaluate our approach on three widely-used FGVC benchmarks (Aircraft, Cars196, and CUB200) and demonstrate its superior performance.
